[mlir][NFC] Move and improve ownership-based buffer dellocation docs (#89196)
Move the documentation of the ownership-based buffer deallocation pass to a separate file. Also improve the documentation a bit and insert a figure that explains the `bufferization.dealloc` op (copied from the tutorial at the LLVM Dev Summit 2023).
This commit is contained in:
parent
9462abdff1
commit
2a4e61b342
@ -1,5 +1,8 @@
|
|||||||
# Buffer Deallocation - Internals
|
# Buffer Deallocation - Internals
|
||||||
|
|
||||||
|
**Note:** This pass is deprecated. Please use the ownership-based buffer
|
||||||
|
deallocation pass instead.
|
||||||
|
|
||||||
This section covers the internal functionality of the BufferDeallocation
|
This section covers the internal functionality of the BufferDeallocation
|
||||||
transformation. The transformation consists of several passes. The main pass
|
transformation. The transformation consists of several passes. The main pass
|
||||||
called BufferDeallocation can be applied via “-buffer-deallocation” on MLIR
|
called BufferDeallocation can be applied via “-buffer-deallocation” on MLIR
|
||||||
|
@ -240,657 +240,6 @@ Alternatively,
|
|||||||
skips the analysis and inserts a copy on every buffer write, just like the
|
skips the analysis and inserts a copy on every buffer write, just like the
|
||||||
dialect conversion-based bufferization.
|
dialect conversion-based bufferization.
|
||||||
|
|
||||||
## Buffer Deallocation
|
|
||||||
|
|
||||||
**Important: this pass is deprecated, please use the ownership based buffer**
|
|
||||||
**deallocation pass instead**
|
|
||||||
|
|
||||||
One-Shot Bufferize deallocates all buffers that it allocates. This is in
|
|
||||||
contrast to the dialect conversion-based bufferization that delegates this job
|
|
||||||
to the
|
|
||||||
[`-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-buffer-deallocation-adds-all-required-dealloc-operations-for-all-allocations-in-the-input-program)
|
|
||||||
pass. By default, One-Shot Bufferize rejects IR where a newly allocated buffer
|
|
||||||
is returned from a block. Such IR will fail bufferization.
|
|
||||||
|
|
||||||
A new buffer allocation is returned from a block when the result of an op that
|
|
||||||
is not in destination-passing style is returned. E.g.:
|
|
||||||
|
|
||||||
```mlir
|
|
||||||
%0 = scf.if %c -> (tensor<?xf32>) {
|
|
||||||
%1 = tensor.generate ... -> tensor<?xf32>
|
|
||||||
scf.yield %1 : tensor<?xf32>
|
|
||||||
} else {
|
|
||||||
scf.yield %another_tensor : tensor<?xf32>
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
The `scf.yield` in the "else" branch is OK, but the `scf.yield` in the "then"
|
|
||||||
branch will be rejected.
|
|
||||||
|
|
||||||
Another case in which a buffer allocation may be returned is when a buffer copy
|
|
||||||
must be inserted due to a RaW conflict. E.g.:
|
|
||||||
|
|
||||||
```mlir
|
|
||||||
%0 = scf.if %c -> (tensor<?xf32>) {
|
|
||||||
%1 = tensor.insert %cst into %another_tensor[%idx] : tensor<?xf32>
|
|
||||||
"my_dialect.reading_tensor_op"(%another_tensor) : (tensor<?xf32>) -> ()
|
|
||||||
...
|
|
||||||
scf.yield %1 : tensor<?xf32>
|
|
||||||
} else {
|
|
||||||
scf.yield %yet_another_tensor : tensor<?xf32>
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
In the above example, a buffer copy of `buffer(%another_tensor)` (with `%cst`
|
|
||||||
inserted) is yielded from the "then" branch.
|
|
||||||
|
|
||||||
Note: Buffer allocations that are returned from a function are not deallocated.
|
|
||||||
It is the caller's responsibility to deallocate the buffer. For the full
|
|
||||||
function boundary ABI for MemRefs w.r.t. buffer deallocation refer to the
|
|
||||||
[*Function Boundary ABI*](#function-boundary-abi) section. In the future, this
|
|
||||||
could be automated with allocation hoisting (across function boundaries) or
|
|
||||||
reference counting.
|
|
||||||
|
|
||||||
One-Shot Bufferize leaks all memory and does not generate any buffer
|
|
||||||
deallocations. The `-buffer-deallocation-pipeline` has to be run afterwards to
|
|
||||||
insert the deallocation operations.
|
|
||||||
|
|
||||||
## Ownership-based Buffer Deallocation
|
|
||||||
|
|
||||||
Recommended compilation pipeline:
|
|
||||||
```
|
|
||||||
one-shot-bufferize
|
|
||||||
| it's recommended to perform all bufferization here at latest,
|
|
||||||
| <- any allocations inserted after this point have to be handled
|
|
||||||
V manually
|
|
||||||
expand-realloc
|
|
||||||
V
|
|
||||||
ownership-based-buffer-deallocation
|
|
||||||
V
|
|
||||||
canonicalize <- mostly for scf.if simplifications
|
|
||||||
V
|
|
||||||
buffer-deallocation-simplification
|
|
||||||
V <- from this point onwards no tensor values are allowed
|
|
||||||
lower-deallocations
|
|
||||||
V
|
|
||||||
CSE
|
|
||||||
V
|
|
||||||
canonicalize
|
|
||||||
```
|
|
||||||
|
|
||||||
One-Shot Bufferize does not deallocate any buffers that it allocates. This job
|
|
||||||
is delegated to the
|
|
||||||
[`-ownership-based-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-ownership-based-buffer-deallocation)
|
|
||||||
pass, i.e., after running One-Shot Bufferize, the result IR may have a number of
|
|
||||||
`memref.alloc` ops, but no `memref.dealloc` ops. This pass processes operations
|
|
||||||
implementing `FunctionOpInterface` one-by-one without analysing the call-graph.
|
|
||||||
This means, that there have to be [some rules](#function-boundary-abi) on how
|
|
||||||
MemRefs are handled when being passed from one function to another. The rest of
|
|
||||||
the pass revolves heavily around the `bufferization.dealloc` operation which is
|
|
||||||
inserted at the end of each basic block with appropriate operands and should be
|
|
||||||
optimized using the Buffer Deallocation Simplification pass
|
|
||||||
(`--buffer-deallocation-simplification`) and the regular canonicalizer
|
|
||||||
(`--canonicalize`). Lowering the result of the
|
|
||||||
`-ownership-based-buffer-deallocation` pass directly using
|
|
||||||
`--convert-bufferization-to-memref` without beforehand optimization is not
|
|
||||||
recommended as it will lead to very inefficient code (the runtime-cost of
|
|
||||||
`bufferization.dealloc` is `O(|memrefs|^2+|memref|*|retained|)`).
|
|
||||||
|
|
||||||
### Function boundary ABI
|
|
||||||
|
|
||||||
The Buffer Deallocation pass operates on the level of operations implementing
|
|
||||||
the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but
|
|
||||||
also return them. To ensure compatibility among all functions (including
|
|
||||||
external ones), some rules have to be enforced:
|
|
||||||
* When a MemRef is passed as a function argument, ownership is never acquired.
|
|
||||||
It is always the caller's responsibility to deallocate such MemRefs.
|
|
||||||
* Returning a MemRef from a function always passes ownership to the caller,
|
|
||||||
i.e., it is also the caller's responsibility to deallocate memrefs returned
|
|
||||||
from a called function.
|
|
||||||
* A function must not return a MemRef with the same allocated base buffer as
|
|
||||||
one of its arguments (in this case a copy has to be created). Note that in
|
|
||||||
this context two subviews of the same buffer that don't overlap are also
|
|
||||||
considered to alias.
|
|
||||||
|
|
||||||
For external functions (e.g., library functions written externally in C), the
|
|
||||||
externally provided implementation has to adhere to these rules and they are
|
|
||||||
just assumed by the buffer deallocation pass. Functions on which the
|
|
||||||
deallocation pass is applied and the implementation is accessible are modified
|
|
||||||
by the pass such that the ABI is respected (i.e., buffer copies are inserted as
|
|
||||||
necessary).
|
|
||||||
|
|
||||||
### Inserting `bufferization.dealloc` operations
|
|
||||||
|
|
||||||
`bufferization.dealloc` operations are unconditionally inserted at the end of
|
|
||||||
each basic block (just before the terminator). The majority of the pass is about
|
|
||||||
finding the correct operands for this operation. There are three variadic
|
|
||||||
operand lists to be populated, the first contains all MemRef values that may
|
|
||||||
need to be deallocated, the second list contains their associated ownership
|
|
||||||
values (of `i1` type), and the third list contains MemRef values that are still
|
|
||||||
needed at a later point and should thus not be deallocated. This operation
|
|
||||||
allows us to deal with any kind of aliasing behavior: it lowers to runtime
|
|
||||||
aliasing checks when not enough information can be collected statically. When
|
|
||||||
enough aliasing information is statically available, operands or the entire op
|
|
||||||
may fold away.
|
|
||||||
|
|
||||||
**Ownerships**
|
|
||||||
|
|
||||||
To do so, we use a concept of ownership indicators of memrefs which materialize
|
|
||||||
as an `i1` value for any SSA value of `memref` type, indicating whether the
|
|
||||||
basic block in which it was materialized has ownership of this MemRef. Ideally,
|
|
||||||
this is a constant `true` or `false`, but might also be a non-constant SSA
|
|
||||||
value. To keep track of those ownership values without immediately materializing
|
|
||||||
them (which might require insertion of `bufferization.clone` operations or
|
|
||||||
operations checking for aliasing at runtime at positions where we don't actually
|
|
||||||
need a materialized value), we use the `Ownership` class. This class represents
|
|
||||||
the ownership in three states forming a lattice on a partial order:
|
|
||||||
```
|
|
||||||
forall X in SSA values. uninitialized < unique(X) < unknown
|
|
||||||
forall X, Y in SSA values.
|
|
||||||
unique(X) == unique(Y) iff X and Y always evaluate to the same value
|
|
||||||
unique(X) != unique(Y) otherwise
|
|
||||||
```
|
|
||||||
Intuitively, the states have the following meaning:
|
|
||||||
* Uninitialized: the ownership is not initialized yet, this is the default
|
|
||||||
state; once an operation is finished processing the ownership of all
|
|
||||||
operation results with MemRef type should not be uninitialized anymore.
|
|
||||||
* Unique: there is a specific SSA value that can be queried to check ownership
|
|
||||||
without materializing any additional IR
|
|
||||||
* Unknown: no specific SSA value is available without materializing additional
|
|
||||||
IR, typically this is because two ownerships in 'Unique' state would have to
|
|
||||||
be merged manually (e.g., the result of an `arith.select` either has the
|
|
||||||
ownership of the then or else case depending on the condition value,
|
|
||||||
inserting another `arith.select` for the ownership values can perform the
|
|
||||||
merge and provide a 'Unique' ownership for the result), however, in the
|
|
||||||
general case this 'Unknown' state has to be assigned.
|
|
||||||
|
|
||||||
Implied by the above partial order, the pass combines two ownerships in the
|
|
||||||
following way:
|
|
||||||
|
|
||||||
| Ownership 1 | Ownership 2 | Combined Ownership |
|
|
||||||
|:--------------|:--------------|:-------------------|
|
|
||||||
| uninitialized | uninitialized | uninitialized |
|
|
||||||
| unique(X) | uninitialized | unique(X) |
|
|
||||||
| unique(X) | unique(X) | unique(X) |
|
|
||||||
| unique(X) | unique(Y) | unknown |
|
|
||||||
| unknown | unique | unknown |
|
|
||||||
| unknown | uninitialized | unknown |
|
|
||||||
| <td colspan=3> + symmetric cases |
|
|
||||||
|
|
||||||
**Collecting the list of MemRefs that potentially need to be deallocated**
|
|
||||||
|
|
||||||
For a given block, the list of MemRefs that potentially need to be deallocated
|
|
||||||
at the end of that block is computed by keeping track of all values for which
|
|
||||||
the block potentially takes over ownership. This includes MemRefs provided as
|
|
||||||
basic block arguments, interface handlers for operations like `memref.alloc` and
|
|
||||||
`func.call`, but also liveness information in regions with multiple basic
|
|
||||||
blocks. More concretely, it is computed by taking the MemRefs in the 'in' set
|
|
||||||
of the liveness analysis of the current basic block B, appended by the MemRef
|
|
||||||
block arguments and by the set of MemRefs allocated in B itself (determined by
|
|
||||||
the interface handlers), then subtracted (also determined by the interface
|
|
||||||
handlers) by the set of MemRefs deallocated in B.
|
|
||||||
|
|
||||||
Note that we don't have to take the intersection of the liveness 'in' set with
|
|
||||||
the 'out' set of the predecessor block because a value that is in the 'in' set
|
|
||||||
must be defined in an ancestor block that dominates all direct predecessors and
|
|
||||||
thus the 'in' set of this block is a subset of the 'out' sets of each
|
|
||||||
predecessor.
|
|
||||||
|
|
||||||
```
|
|
||||||
memrefs = filter((liveIn(block) U
|
|
||||||
allocated(block) U arguments(block)) \ deallocated(block), isMemRef)
|
|
||||||
```
|
|
||||||
|
|
||||||
The list of conditions for the second variadic operands list of
|
|
||||||
`bufferization.dealloc` is computed by querying the stored ownership value for
|
|
||||||
each of the MemRefs collected as described above. The ownership state is updated
|
|
||||||
by the interface handlers while processing the basic block.
|
|
||||||
|
|
||||||
**Collecting the list of MemRefs to retain**
|
|
||||||
|
|
||||||
Given a basic block B, the list of MemRefs that have to be retained can be
|
|
||||||
different for each successor block S. For the two basic blocks B and S and the
|
|
||||||
values passed via block arguments to the destination block S, we compute the
|
|
||||||
list of MemRefs that have to be retained in B by taking the MemRefs in the
|
|
||||||
successor operand list of the terminator and the MemRefs in the 'out' set of the
|
|
||||||
liveness analysis for B intersected with the 'in' set of the destination block
|
|
||||||
S.
|
|
||||||
|
|
||||||
This list of retained values makes sure that we cannot run into use-after-free
|
|
||||||
situations even if no aliasing information is present at compile-time.
|
|
||||||
|
|
||||||
```
|
|
||||||
toRetain = filter(successorOperands + (liveOut(fromBlock) insersect
|
|
||||||
liveIn(toBlock)), isMemRef)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Supported interfaces
|
|
||||||
|
|
||||||
The pass uses liveness analysis and a few interfaces:
|
|
||||||
* `FunctionOpInterface`
|
|
||||||
* `CallOpInterface`
|
|
||||||
* `MemoryEffectOpInterface`
|
|
||||||
* `RegionBranchOpInterface`
|
|
||||||
* `RegionBranchTerminatorOpInterface`
|
|
||||||
|
|
||||||
Due to insufficient information provided by the interface, it also special-cases
|
|
||||||
on the `cf.cond_br` operation and makes some assumptions about operations
|
|
||||||
implementing the `RegionBranchOpInterface` at the moment, but improving the
|
|
||||||
interfaces would allow us to remove those dependencies in the future.
|
|
||||||
|
|
||||||
### Limitations
|
|
||||||
|
|
||||||
The Buffer Deallocation pass has some requirements and limitations on the input
|
|
||||||
IR. These are checked in the beginning of the pass and errors are emitted
|
|
||||||
accordingly:
|
|
||||||
* The set of interfaces the pass operates on must be implemented (correctly).
|
|
||||||
E.g., if there is an operation present with a nested region, but does not
|
|
||||||
implement the `RegionBranchOpInterface`, an error is emitted because the
|
|
||||||
pass cannot know the semantics of the nested region (and does not make any
|
|
||||||
default assumptions on it).
|
|
||||||
* No explicit control-flow loops are present. Currently, only loops using
|
|
||||||
structural-control-flow are supported. However, this limitation could be
|
|
||||||
lifted in the future.
|
|
||||||
* Deallocation operations should not be present already. The pass should
|
|
||||||
handle them correctly already (at least in most cases), but it's not
|
|
||||||
supported yet due to insufficient testing.
|
|
||||||
* Terminators must implement either `RegionBranchTerminatorOpInterface` or
|
|
||||||
`BranchOpInterface`, but not both. Terminators with more than one successor
|
|
||||||
are not supported (except `cf.cond_br`). This is not a fundamental
|
|
||||||
limitation, but there is no use-case justifying the more complex
|
|
||||||
implementation at the moment.
|
|
||||||
|
|
||||||
### Example
|
|
||||||
|
|
||||||
The following example contains a few interesting cases:
|
|
||||||
* Basic block arguments are modified to also pass along the ownership
|
|
||||||
indicator, but not for entry blocks, where the function boundary ABI
|
|
||||||
is applied instead.
|
|
||||||
* The result of `arith.select` initially has 'Unknown' assigned as ownership,
|
|
||||||
but once the `bufferization.dealloc` operation is inserted it is put in the
|
|
||||||
'retained' list (since it has uses in a later basic block) and thus the
|
|
||||||
'Unknown' ownership can be replaced with a 'Unique' ownership using the
|
|
||||||
corresponding result of the dealloc operation.
|
|
||||||
* The `cf.cond_br` operation has more than one successor and thus has to
|
|
||||||
insert two `bufferization.dealloc` operations (one for each successor).
|
|
||||||
While they have the same list of MemRefs to deallocate (because they perform
|
|
||||||
the deallocations for the same block), it must be taken into account that
|
|
||||||
some MemRefs remain *live* for one branch but not the other (thus set
|
|
||||||
intersection is performed on the *live-out* of the current block and the
|
|
||||||
*live-in* of the target block). Also, `cf.cond_br` supports separate
|
|
||||||
forwarding operands for each successor. To make sure that no MemRef is
|
|
||||||
deallocated twice (because there are two `bufferization.dealloc` operations
|
|
||||||
with the same MemRefs to deallocate), the condition operands are adjusted to
|
|
||||||
take the branch condition into account. While a generic lowering for such
|
|
||||||
terminator operations could be implemented, a specialized implementation can
|
|
||||||
take all the semantics of this particular operation into account and thus
|
|
||||||
generate a more efficient lowering.
|
|
||||||
|
|
||||||
```mlir
|
|
||||||
func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
|
|
||||||
%alloc = memref.alloc() : memref<?xi8>
|
|
||||||
%alloca = memref.alloca() : memref<?xi8>
|
|
||||||
%select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
|
|
||||||
cf.cond_br %br_cond, ^bb1(%alloc : memref<?xi8>), ^bb1(%memref : memref<?xi8>)
|
|
||||||
^bb1(%bbarg: memref<?xi8>):
|
|
||||||
test.copy(%bbarg, %select) : (memref<?xi8>, memref<?xi8>)
|
|
||||||
return
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
After running `--ownership-based-buffer-deallocation`, it looks as follows:
|
|
||||||
|
|
||||||
```mlir
|
|
||||||
// Function boundary ABI: ownership of `%memref` will never be acquired.
|
|
||||||
func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
|
|
||||||
%false = arith.constant false
|
|
||||||
%true = arith.constant true
|
|
||||||
|
|
||||||
// The ownership of a MemRef defined by the `memref.alloc` operation is always
|
|
||||||
// assigned to be 'true'.
|
|
||||||
%alloc = memref.alloc() : memref<?xi8>
|
|
||||||
|
|
||||||
// The ownership of a MemRef defined by the `memref.alloca` operation is
|
|
||||||
// always assigned to be 'false'.
|
|
||||||
%alloca = memref.alloca() : memref<?xi8>
|
|
||||||
|
|
||||||
// The ownership of %select will be the join of the ownership of %alloc and
|
|
||||||
// the ownership of %alloca, i.e., of %true and %false. Because the pass does
|
|
||||||
// not know about the semantics of the `arith.select` operation (unless a
|
|
||||||
// custom handler is implemented), the ownership join will be 'Unknown'. If
|
|
||||||
// the materialized ownership indicator of %select is needed, either a clone
|
|
||||||
// has to be created for which %true is assigned as ownership or the result
|
|
||||||
// of a `bufferization.dealloc` where %select is in the retain list has to be
|
|
||||||
// used.
|
|
||||||
%select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
|
|
||||||
|
|
||||||
// We use `memref.extract_strided_metadata` to get the base memref since it is
|
|
||||||
// not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is
|
|
||||||
// already enforced for `bufferization.dealloc`
|
|
||||||
%base_buffer_memref, ... = memref.extract_strided_metadata %memref
|
|
||||||
: memref<?xi8> -> memref<i8>, index, index, index
|
|
||||||
%base_buffer_alloc, ... = memref.extract_strided_metadata %alloc
|
|
||||||
: memref<?xi8> -> memref<i8>, index, index, index
|
|
||||||
%base_buffer_alloca, ... = memref.extract_strided_metadata %alloca
|
|
||||||
: memref<?xi8> -> memref<i8>, index, index, index
|
|
||||||
|
|
||||||
// The deallocation conditions need to be adjusted to incorporate the branch
|
|
||||||
// condition. In this example, this requires only a single negation, but might
|
|
||||||
// also require multiple arith.andi operations.
|
|
||||||
%not_br_cond = arith.xori %true, %br_cond : i1
|
|
||||||
|
|
||||||
// There are two dealloc operations inserted in this basic block, one per
|
|
||||||
// successor. Both have the same list of MemRefs to deallocate and the
|
|
||||||
// conditions only differ by the branch condition conjunct.
|
|
||||||
// Note, however, that the retained list differs. Here, both contain the
|
|
||||||
// %select value because it is used in both successors (since it's the same
|
|
||||||
// block), but the value passed via block argument differs (%memref vs.
|
|
||||||
// %alloc).
|
|
||||||
%10:2 = bufferization.dealloc
|
|
||||||
(%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
|
|
||||||
: memref<i8>, memref<i8>, memref<i8>)
|
|
||||||
if (%false, %br_cond, %false)
|
|
||||||
retain (%alloc, %select : memref<?xi8>, memref<?xi8>)
|
|
||||||
|
|
||||||
%11:2 = bufferization.dealloc
|
|
||||||
(%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
|
|
||||||
: memref<i8>, memref<i8>, memref<i8>)
|
|
||||||
if (%false, %not_br_cond, %false)
|
|
||||||
retain (%memref, %select : memref<?xi8>, memref<?xi8>)
|
|
||||||
|
|
||||||
// Because %select is used in ^bb1 without passing it via block argument, we
|
|
||||||
// need to update it's ownership value here by merging the ownership values
|
|
||||||
// returned by the dealloc operations
|
|
||||||
%new_ownership = arith.select %br_cond, %10#1, %11#1 : i1
|
|
||||||
|
|
||||||
// The terminator is modified to pass along the ownership indicator values
|
|
||||||
// with each MemRef value.
|
|
||||||
cf.cond_br %br_cond, ^bb1(%alloc, %10#0 : memref<?xi8>, i1),
|
|
||||||
^bb1(%memref, %11#0 : memref<?xi8>, i1)
|
|
||||||
|
|
||||||
// All non-entry basic blocks are modified to have an additional i1 argument for
|
|
||||||
// each MemRef value in the argument list.
|
|
||||||
^bb1(%13: memref<?xi8>, %14: i1): // 2 preds: ^bb0, ^bb0
|
|
||||||
test.copy(%13, %select) : (memref<?xi8>, memref<?xi8>)
|
|
||||||
|
|
||||||
%base_buffer_13, ... = memref.extract_strided_metadata %13
|
|
||||||
: memref<?xi8> -> memref<i8>, index, index, index
|
|
||||||
%base_buffer_select, ... = memref.extract_strided_metadata %select
|
|
||||||
: memref<?xi8> -> memref<i8>, index, index, index
|
|
||||||
|
|
||||||
// Here, we don't have a retained list, because the block has no successors
|
|
||||||
// and the return has no operands.
|
|
||||||
bufferization.dealloc (%base_buffer_13, %base_buffer_select
|
|
||||||
: memref<i8>, memref<i8>)
|
|
||||||
if (%14, %new_ownership)
|
|
||||||
return
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Buffer Deallocation Simplification Pass
|
|
||||||
|
|
||||||
The [semantics of the `bufferization.dealloc` operation](https://mlir.llvm.org/docs/Dialects/BufferizationOps/#bufferizationdealloc-bufferizationdeallocop)
|
|
||||||
provide a lot of opportunities for optimizations which can be conveniently split
|
|
||||||
into patterns using the greedy pattern rewriter. Some of those patterns need
|
|
||||||
access to additional analyses such as an analysis that can determine whether two
|
|
||||||
MemRef values must, may, or never originate from the same buffer allocation.
|
|
||||||
These patterns are collected in the Buffer Deallocation Simplification pass,
|
|
||||||
while patterns that don't need additional analyses are registered as part of the
|
|
||||||
regular canonicalizer pass. This pass is best run after
|
|
||||||
`--ownership-based-buffer-deallocation` followed by `--canonicalize`.
|
|
||||||
|
|
||||||
The pass applies patterns for the following simplifications:
|
|
||||||
* Remove MemRefs from retain list when guaranteed to not alias with any value
|
|
||||||
in the 'memref' operand list. This avoids an additional aliasing check with
|
|
||||||
the removed value.
|
|
||||||
* Split off values in the 'memref' list to new `bufferization.dealloc`
|
|
||||||
operations only containing this value in the 'memref' list when it is
|
|
||||||
guaranteed to not alias with any other value in the 'memref' list. This
|
|
||||||
avoids at least one aliasing check at runtime and enables using a more
|
|
||||||
efficient lowering for this new `bufferization.dealloc` operation.
|
|
||||||
* Remove values from the 'memref' operand list when it is guaranteed to alias
|
|
||||||
with at least one value in the 'retained' list and may not alias any other
|
|
||||||
value in the 'retain' list.
|
|
||||||
|
|
||||||
## Lower Deallocations Pass
|
|
||||||
|
|
||||||
The `-lower-deallocations` pass transforms all `bufferization.dealloc`
|
|
||||||
operations to `memref.dealloc` operations and may also insert operations from
|
|
||||||
the `scf`, `func`, and `arith` dialects to make deallocations conditional and
|
|
||||||
check whether two MemRef values come from the same allocation at runtime (when
|
|
||||||
the `buffer-deallocation-simplification` pass wasn't able to determine it
|
|
||||||
statically).
|
|
||||||
|
|
||||||
The same lowering of the `bufferization.dealloc` operation is also part of the
|
|
||||||
`-convert-bufferization-to-memref` conversion pass which also lowers all the
|
|
||||||
other operations of the bufferization dialect.
|
|
||||||
|
|
||||||
We distinguish multiple cases in this lowering pass to provide an overall more
|
|
||||||
efficient lowering. In the general case, a library function is created to avoid
|
|
||||||
quadratic code size explosion (relative to the number of operands of the dealloc
|
|
||||||
operation). The specialized lowerings aim to avoid this library function because
|
|
||||||
it requires allocating auxiliary MemRefs of index values.
|
|
||||||
|
|
||||||
### Generic Lowering
|
|
||||||
|
|
||||||
A library function is generated to avoid code-size blow-up. On a high level, the
|
|
||||||
base-memref of all operands is extracted as an index value and stored into
|
|
||||||
specifically allocated MemRefs and passed to the library function which then
|
|
||||||
determines whether they come from the same original allocation. This information
|
|
||||||
is needed to avoid double-free situations and to correctly retain the MemRef
|
|
||||||
values in the `retained` list.
|
|
||||||
|
|
||||||
**Dealloc Operation Lowering**
|
|
||||||
|
|
||||||
This lowering supports all features the dealloc operation has to offer. It
|
|
||||||
computes the base pointer of each memref (as an index), stores it in a
|
|
||||||
new memref helper structure and passes it to the helper function generated
|
|
||||||
in `buildDeallocationLibraryFunction`. The results are stored in two lists
|
|
||||||
(represented as MemRefs) of booleans passed as arguments. The first list
|
|
||||||
stores whether the corresponding condition should be deallocated, the
|
|
||||||
second list stores the ownership of the retained values which can be used
|
|
||||||
to replace the result values of the `bufferization.dealloc` operation.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
```
|
|
||||||
%0:2 = bufferization.dealloc (%m0, %m1 : memref<2xf32>, memref<5xf32>)
|
|
||||||
if (%cond0, %cond1)
|
|
||||||
retain (%r0, %r1 : memref<1xf32>, memref<2xf32>)
|
|
||||||
```
|
|
||||||
lowers to (simplified):
|
|
||||||
```
|
|
||||||
%c0 = arith.constant 0 : index
|
|
||||||
%c1 = arith.constant 1 : index
|
|
||||||
%dealloc_base_pointer_list = memref.alloc() : memref<2xindex>
|
|
||||||
%cond_list = memref.alloc() : memref<2xi1>
|
|
||||||
%retain_base_pointer_list = memref.alloc() : memref<2xindex>
|
|
||||||
%m0_base_pointer = memref.extract_aligned_pointer_as_index %m0
|
|
||||||
memref.store %m0_base_pointer, %dealloc_base_pointer_list[%c0]
|
|
||||||
%m1_base_pointer = memref.extract_aligned_pointer_as_index %m1
|
|
||||||
memref.store %m1_base_pointer, %dealloc_base_pointer_list[%c1]
|
|
||||||
memref.store %cond0, %cond_list[%c0]
|
|
||||||
memref.store %cond1, %cond_list[%c1]
|
|
||||||
%r0_base_pointer = memref.extract_aligned_pointer_as_index %r0
|
|
||||||
memref.store %r0_base_pointer, %retain_base_pointer_list[%c0]
|
|
||||||
%r1_base_pointer = memref.extract_aligned_pointer_as_index %r1
|
|
||||||
memref.store %r1_base_pointer, %retain_base_pointer_list[%c1]
|
|
||||||
%dyn_dealloc_base_pointer_list = memref.cast %dealloc_base_pointer_list :
|
|
||||||
memref<2xindex> to memref<?xindex>
|
|
||||||
%dyn_cond_list = memref.cast %cond_list : memref<2xi1> to memref<?xi1>
|
|
||||||
%dyn_retain_base_pointer_list = memref.cast %retain_base_pointer_list :
|
|
||||||
memref<2xindex> to memref<?xindex>
|
|
||||||
%dealloc_cond_out = memref.alloc() : memref<2xi1>
|
|
||||||
%ownership_out = memref.alloc() : memref<2xi1>
|
|
||||||
%dyn_dealloc_cond_out = memref.cast %dealloc_cond_out :
|
|
||||||
memref<2xi1> to memref<?xi1>
|
|
||||||
%dyn_ownership_out = memref.cast %ownership_out :
|
|
||||||
memref<2xi1> to memref<?xi1>
|
|
||||||
call @dealloc_helper(%dyn_dealloc_base_pointer_list,
|
|
||||||
%dyn_retain_base_pointer_list,
|
|
||||||
%dyn_cond_list,
|
|
||||||
%dyn_dealloc_cond_out,
|
|
||||||
%dyn_ownership_out) : (...)
|
|
||||||
%m0_dealloc_cond = memref.load %dyn_dealloc_cond_out[%c0] : memref<2xi1>
|
|
||||||
scf.if %m0_dealloc_cond {
|
|
||||||
memref.dealloc %m0 : memref<2xf32>
|
|
||||||
}
|
|
||||||
%m1_dealloc_cond = memref.load %dyn_dealloc_cond_out[%c1] : memref<2xi1>
|
|
||||||
scf.if %m1_dealloc_cond {
|
|
||||||
memref.dealloc %m1 : memref<5xf32>
|
|
||||||
}
|
|
||||||
%r0_ownership = memref.load %dyn_ownership_out[%c0] : memref<2xi1>
|
|
||||||
%r1_ownership = memref.load %dyn_ownership_out[%c1] : memref<2xi1>
|
|
||||||
memref.dealloc %dealloc_base_pointer_list : memref<2xindex>
|
|
||||||
memref.dealloc %retain_base_pointer_list : memref<2xindex>
|
|
||||||
memref.dealloc %cond_list : memref<2xi1>
|
|
||||||
memref.dealloc %dealloc_cond_out : memref<2xi1>
|
|
||||||
memref.dealloc %ownership_out : memref<2xi1>
|
|
||||||
// replace %0#0 with %r0_ownership
|
|
||||||
// replace %0#1 with %r1_ownership
|
|
||||||
```
|
|
||||||
|
|
||||||
**Library function**
|
|
||||||
|
|
||||||
A library function is built per compilation unit that can be called at
|
|
||||||
bufferization dealloc sites to determine whether two MemRefs come from the same
|
|
||||||
allocation and their new ownerships.
|
|
||||||
|
|
||||||
The generated function takes two MemRefs of indices and three MemRefs of
|
|
||||||
booleans as arguments:
|
|
||||||
* The first argument A should contain the result of the
|
|
||||||
extract_aligned_pointer_as_index operation applied to the MemRefs to be
|
|
||||||
deallocated
|
|
||||||
* The second argument B should contain the result of the
|
|
||||||
extract_aligned_pointer_as_index operation applied to the MemRefs to be
|
|
||||||
retained
|
|
||||||
* The third argument C should contain the conditions as passed directly
|
|
||||||
to the deallocation operation.
|
|
||||||
* The fourth argument D is used to pass results to the caller. Those
|
|
||||||
represent the condition under which the MemRef at the corresponding
|
|
||||||
position in A should be deallocated.
|
|
||||||
* The fifth argument E is used to pass results to the caller. It
|
|
||||||
provides the ownership value corresponding the the MemRef at the same
|
|
||||||
position in B
|
|
||||||
|
|
||||||
This helper function is supposed to be called once for each
|
|
||||||
`bufferization.dealloc` operation to determine the deallocation need and
|
|
||||||
new ownership indicator for the retained values, but does not perform the
|
|
||||||
deallocation itself.
|
|
||||||
|
|
||||||
Generated code:
|
|
||||||
```
|
|
||||||
func.func @dealloc_helper(
|
|
||||||
%dyn_dealloc_base_pointer_list: memref<?xindex>,
|
|
||||||
%dyn_retain_base_pointer_list: memref<?xindex>,
|
|
||||||
%dyn_cond_list: memref<?xi1>,
|
|
||||||
%dyn_dealloc_cond_out: memref<?xi1>,
|
|
||||||
%dyn_ownership_out: memref<?xi1>) {
|
|
||||||
%c0 = arith.constant 0 : index
|
|
||||||
%c1 = arith.constant 1 : index
|
|
||||||
%true = arith.constant true
|
|
||||||
%false = arith.constant false
|
|
||||||
%num_dealloc_memrefs = memref.dim %dyn_dealloc_base_pointer_list, %c0
|
|
||||||
%num_retain_memrefs = memref.dim %dyn_retain_base_pointer_list, %c0
|
|
||||||
// Zero initialize result buffer.
|
|
||||||
scf.for %i = %c0 to %num_retain_memrefs step %c1 {
|
|
||||||
memref.store %false, %dyn_ownership_out[%i] : memref<?xi1>
|
|
||||||
}
|
|
||||||
scf.for %i = %c0 to %num_dealloc_memrefs step %c1 {
|
|
||||||
%dealloc_bp = memref.load %dyn_dealloc_base_pointer_list[%i]
|
|
||||||
%cond = memref.load %dyn_cond_list[%i]
|
|
||||||
// Check for aliasing with retained memrefs.
|
|
||||||
%does_not_alias_retained = scf.for %j = %c0 to %num_retain_memrefs
|
|
||||||
step %c1 iter_args(%does_not_alias_aggregated = %true) -> (i1) {
|
|
||||||
%retain_bp = memref.load %dyn_retain_base_pointer_list[%j]
|
|
||||||
%does_alias = arith.cmpi eq, %retain_bp, %dealloc_bp : index
|
|
||||||
scf.if %does_alias {
|
|
||||||
%curr_ownership = memref.load %dyn_ownership_out[%j]
|
|
||||||
%updated_ownership = arith.ori %curr_ownership, %cond : i1
|
|
||||||
memref.store %updated_ownership, %dyn_ownership_out[%j]
|
|
||||||
}
|
|
||||||
%does_not_alias = arith.cmpi ne, %retain_bp, %dealloc_bp : index
|
|
||||||
%updated_aggregate = arith.andi %does_not_alias_aggregated,
|
|
||||||
%does_not_alias : i1
|
|
||||||
scf.yield %updated_aggregate : i1
|
|
||||||
}
|
|
||||||
// Check for aliasing with dealloc memrefs in the list before the
|
|
||||||
// current one, i.e.,
|
|
||||||
// `fix i, forall j < i: check_aliasing(%dyn_dealloc_base_pointer[j],
|
|
||||||
// %dyn_dealloc_base_pointer[i])`
|
|
||||||
%does_not_alias_any = scf.for %j = %c0 to %i step %c1
|
|
||||||
iter_args(%does_not_alias_agg = %does_not_alias_retained) -> (i1) {
|
|
||||||
%prev_dealloc_bp = memref.load %dyn_dealloc_base_pointer_list[%j]
|
|
||||||
%does_not_alias = arith.cmpi ne, %prev_dealloc_bp, %dealloc_bp
|
|
||||||
%updated_alias_agg = arith.andi %does_not_alias_agg, %does_not_alias
|
|
||||||
scf.yield %updated_alias_agg : i1
|
|
||||||
}
|
|
||||||
%dealloc_cond = arith.andi %does_not_alias_any, %cond : i1
|
|
||||||
memref.store %dealloc_cond, %dyn_dealloc_cond_out[%i] : memref<?xi1>
|
|
||||||
}
|
|
||||||
return
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Specialized Lowerings
|
|
||||||
|
|
||||||
Currently, there are two special lowerings for common cases to avoid the library
|
|
||||||
function and thus unnecessary memory load and store operations and function
|
|
||||||
calls:
|
|
||||||
|
|
||||||
**One memref, no retained**
|
|
||||||
|
|
||||||
Lower a simple case without any retained values and a single MemRef. Ideally,
|
|
||||||
static analysis can provide enough information such that the
|
|
||||||
`buffer-deallocation-simplification` pass is able to split the dealloc
|
|
||||||
operations up into this simple case as much as possible before running this
|
|
||||||
pass.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
```mlir
|
|
||||||
bufferization.dealloc (%arg0 : memref<2xf32>) if (%arg1)
|
|
||||||
```
|
|
||||||
is lowered to
|
|
||||||
```mlir
|
|
||||||
scf.if %arg1 {
|
|
||||||
memref.dealloc %arg0 : memref<2xf32>
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
In most cases, the branch condition is either constant 'true' or 'false' and can
|
|
||||||
thus be optimized away entirely by the canonicalizer pass.
|
|
||||||
|
|
||||||
**One memref, arbitrarily many retained**
|
|
||||||
|
|
||||||
A special case lowering for the deallocation operation with exactly one MemRef,
|
|
||||||
but an arbitrary number of retained values. The size of the code produced by
|
|
||||||
this lowering is linear to the number of retained values.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
```mlir
|
|
||||||
%0:2 = bufferization.dealloc (%m : memref<2xf32>) if (%cond)
|
|
||||||
retain (%r0, %r1 : memref<1xf32>, memref<2xf32>)
|
|
||||||
return %0#0, %0#1 : i1, i1
|
|
||||||
```
|
|
||||||
is lowered to
|
|
||||||
```mlir
|
|
||||||
%m_base_pointer = memref.extract_aligned_pointer_as_index %m
|
|
||||||
%r0_base_pointer = memref.extract_aligned_pointer_as_index %r0
|
|
||||||
%r0_does_not_alias = arith.cmpi ne, %m_base_pointer, %r0_base_pointer
|
|
||||||
%r1_base_pointer = memref.extract_aligned_pointer_as_index %r1
|
|
||||||
%r1_does_not_alias = arith.cmpi ne, %m_base_pointer, %r1_base_pointer
|
|
||||||
%not_retained = arith.andi %r0_does_not_alias, %r1_does_not_alias : i1
|
|
||||||
%should_dealloc = arith.andi %not_retained, %cond : i1
|
|
||||||
scf.if %should_dealloc {
|
|
||||||
memref.dealloc %m : memref<2xf32>
|
|
||||||
}
|
|
||||||
%true = arith.constant true
|
|
||||||
%r0_does_alias = arith.xori %r0_does_not_alias, %true : i1
|
|
||||||
%r0_ownership = arith.andi %r0_does_alias, %cond : i1
|
|
||||||
%r1_does_alias = arith.xori %r1_does_not_alias, %true : i1
|
|
||||||
%r1_ownership = arith.andi %r1_does_alias, %cond : i1
|
|
||||||
return %r0_ownership, %r1_ownership : i1, i1
|
|
||||||
```
|
|
||||||
|
|
||||||
## Memory Layouts
|
## Memory Layouts
|
||||||
|
|
||||||
One-Shot Bufferize bufferizes ops from top to bottom. This works well when all
|
One-Shot Bufferize bufferizes ops from top to bottom. This works well when all
|
||||||
@ -1059,7 +408,7 @@ of:
|
|||||||
1. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and
|
1. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and
|
||||||
`promote-buffers-to-stack`, which do optimizations that are only exposed
|
`promote-buffers-to-stack`, which do optimizations that are only exposed
|
||||||
after bufferization.
|
after bufferization.
|
||||||
1. Finally, running the [buffer deallocation](BufferDeallocationInternals.md)
|
1. Finally, running the [ownership-based buffer deallocation](OwnershipBasedBufferDeallocation.md)
|
||||||
pass.
|
pass.
|
||||||
|
|
||||||
After buffer deallocation has been completed, the program will be quite
|
After buffer deallocation has been completed, the program will be quite
|
||||||
|
619
mlir/docs/OwnershipBasedBufferDeallocation.md
Normal file
619
mlir/docs/OwnershipBasedBufferDeallocation.md
Normal file
@ -0,0 +1,619 @@
|
|||||||
|
# Ownership-based Buffer Deallocation
|
||||||
|
|
||||||
|
[TOC]
|
||||||
|
|
||||||
|
One-Shot Bufferize does not deallocate any buffers that it allocates. After
|
||||||
|
running One-Shot Bufferize, the resulting IR may have a number of `memref.alloc`
|
||||||
|
ops, but no `memref.dealloc` ops. Buffer dellocation is delegated to the
|
||||||
|
`-ownership-based-buffer-deallocation` pass. This pass supersedes the now
|
||||||
|
deprecated `-buffer-deallocation` pass, which does not work well with
|
||||||
|
One-Shot Bufferize.
|
||||||
|
|
||||||
|
On a high level, buffers are "owned" by a basic block. Ownership materializes
|
||||||
|
as an `i1` SSA value and can be thought of as "responsibility to deallocate". It
|
||||||
|
is conceptually similar to `std::unique_ptr` in C++.
|
||||||
|
|
||||||
|
There are few additional preprocessing and postprocessing passes that should be
|
||||||
|
run together with the ownership-based buffer deallocation pass. The recommended
|
||||||
|
compilation pipeline is as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
one-shot-bufferize
|
||||||
|
| it's recommended to perform all bufferization here at latest,
|
||||||
|
| <- any allocations inserted after this point have to be handled
|
||||||
|
V manually
|
||||||
|
expand-realloc
|
||||||
|
V
|
||||||
|
ownership-based-buffer-deallocation
|
||||||
|
V
|
||||||
|
canonicalize <- mostly for scf.if simplifications
|
||||||
|
V
|
||||||
|
buffer-deallocation-simplification
|
||||||
|
V <- from this point onwards no tensor values are allowed
|
||||||
|
lower-deallocations
|
||||||
|
V
|
||||||
|
CSE
|
||||||
|
V
|
||||||
|
canonicalize
|
||||||
|
```
|
||||||
|
|
||||||
|
The entire deallocation pipeline (excluding `-one-shot-bufferize`) is exposed
|
||||||
|
as `-buffer-deallocation-pipeline`.
|
||||||
|
|
||||||
|
The ownership-based buffer deallocation pass processes operations implementing
|
||||||
|
`FunctionOpInterface` one-by-one without analysing the call-graph.
|
||||||
|
This means that there have to be [some rules](#function-boundary-abi) on how
|
||||||
|
MemRefs are handled when being passed from one function to another. The rest of
|
||||||
|
the pass revolves heavily around the `bufferization.dealloc` operation which is
|
||||||
|
inserted at the end of each basic block with appropriate operands and should be
|
||||||
|
optimized using the Buffer Deallocation Simplification pass
|
||||||
|
(`--buffer-deallocation-simplification`) and the regular canonicalizer
|
||||||
|
(`--canonicalize`). Lowering the result of the
|
||||||
|
`-ownership-based-buffer-deallocation` pass directly using
|
||||||
|
`--convert-bufferization-to-memref` without beforehand optimization is not
|
||||||
|
recommended as it will lead to very inefficient code (the runtime-cost of
|
||||||
|
`bufferization.dealloc` is `O(|memrefs|^2+|memref|*|retained|)`).
|
||||||
|
|
||||||
|
## Function boundary ABI
|
||||||
|
|
||||||
|
The Buffer Deallocation pass operates on the level of operations implementing
|
||||||
|
the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but
|
||||||
|
also return them. To ensure compatibility among all functions (including
|
||||||
|
external ones), some rules have to be enforced:
|
||||||
|
* When a MemRef is passed as a function argument, ownership is never acquired.
|
||||||
|
It is always the caller's responsibility to deallocate such MemRefs.
|
||||||
|
* Returning a MemRef from a function always passes ownership to the caller,
|
||||||
|
i.e., it is also the caller's responsibility to deallocate memrefs returned
|
||||||
|
from a called function.
|
||||||
|
* A function must not return a MemRef with the same allocated base buffer as
|
||||||
|
one of its arguments (in this case a copy has to be created). Note that in
|
||||||
|
this context two subviews of the same buffer that don't overlap are also
|
||||||
|
considered to alias.
|
||||||
|
|
||||||
|
For external functions (e.g., library functions written externally in C), the
|
||||||
|
externally provided implementation has to adhere to these rules and they are
|
||||||
|
just assumed by the buffer deallocation pass. Functions on which the
|
||||||
|
deallocation pass is applied and for which the implementation is accessible are
|
||||||
|
modified by the pass such that the ABI is respected (i.e., buffer copies are
|
||||||
|
inserted when necessary).
|
||||||
|
|
||||||
|
## Inserting `bufferization.dealloc` operations
|
||||||
|
|
||||||
|
`bufferization.dealloc` and ownership indicators are the main abstractions in
|
||||||
|
the ownership-based buffer deallocation pass. `bufferization.dealloc`
|
||||||
|
deallocates all given buffers if the respective ownership indicator is set and
|
||||||
|
there is no aliasing buffer in the retain list.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
`bufferization.dealloc` operations are unconditionally inserted at the end of
|
||||||
|
each basic block (just before the terminator). The majority of the pass is about
|
||||||
|
finding the correct operands for this operation. There are three variadic
|
||||||
|
operand lists to be populated, the first contains all MemRef values that may
|
||||||
|
need to be deallocated, the second list contains their associated ownership
|
||||||
|
values (of `i1` type), and the third list contains MemRef values that are still
|
||||||
|
needed at a later point and should thus not be deallocated (e.g., yielded or
|
||||||
|
returned buffers).
|
||||||
|
|
||||||
|
`bufferization.dealloc` allows us to deal with any kind of aliasing behavior: it
|
||||||
|
lowers to runtime aliasing checks when not enough information can be collected
|
||||||
|
statically. When enough aliasing information is statically available, operands
|
||||||
|
or the entire op may fold away.
|
||||||
|
|
||||||
|
**Ownerships**
|
||||||
|
|
||||||
|
To do so, we use a concept of ownership indicators of memrefs which materialize
|
||||||
|
as an `i1` value for any SSA value of `memref` type, indicating whether the
|
||||||
|
basic block in which it was materialized has ownership of this MemRef. Ideally,
|
||||||
|
this is a constant `true` or `false`, but might also be a non-constant SSA
|
||||||
|
value. To keep track of those ownership values without immediately materializing
|
||||||
|
them (which might require insertion of `bufferization.clone` operations or
|
||||||
|
operations checking for aliasing at runtime at positions where we don't actually
|
||||||
|
need a materialized value), we use the `Ownership` class. This class represents
|
||||||
|
the ownership in three states forming a lattice on a partial order:
|
||||||
|
```
|
||||||
|
forall X in SSA values. uninitialized < unique(X) < unknown
|
||||||
|
forall X, Y in SSA values.
|
||||||
|
unique(X) == unique(Y) iff X and Y always evaluate to the same value
|
||||||
|
unique(X) != unique(Y) otherwise
|
||||||
|
```
|
||||||
|
Intuitively, the states have the following meaning:
|
||||||
|
* Uninitialized: the ownership is not initialized yet, this is the default
|
||||||
|
state; once an operation is finished processing the ownership of all
|
||||||
|
operation results with MemRef type should not be uninitialized anymore.
|
||||||
|
* Unique: there is a specific SSA value that can be queried to check ownership
|
||||||
|
without materializing any additional IR
|
||||||
|
* Unknown: no specific SSA value is available without materializing additional
|
||||||
|
IR, typically this is because two ownerships in 'Unique' state would have to
|
||||||
|
be merged manually (e.g., the result of an `arith.select` either has the
|
||||||
|
ownership of the then or else case depending on the condition value,
|
||||||
|
inserting another `arith.select` for the ownership values can perform the
|
||||||
|
merge and provide a 'Unique' ownership for the result), however, in the
|
||||||
|
general case this 'Unknown' state has to be assigned.
|
||||||
|
|
||||||
|
Implied by the above partial order, the pass combines two ownerships in the
|
||||||
|
following way:
|
||||||
|
|
||||||
|
| Ownership 1 | Ownership 2 | Combined Ownership |
|
||||||
|
|:--------------|:--------------|:-------------------|
|
||||||
|
| uninitialized | uninitialized | uninitialized |
|
||||||
|
| unique(X) | uninitialized | unique(X) |
|
||||||
|
| unique(X) | unique(X) | unique(X) |
|
||||||
|
| unique(X) | unique(Y) | unknown |
|
||||||
|
| unknown | unique | unknown |
|
||||||
|
| unknown | uninitialized | unknown |
|
||||||
|
| <td colspan=3> + symmetric cases |
|
||||||
|
|
||||||
|
**Collecting the list of MemRefs that potentially need to be deallocated**
|
||||||
|
|
||||||
|
For a given block, the list of MemRefs that potentially need to be deallocated
|
||||||
|
at the end of that block is computed by keeping track of all values for which
|
||||||
|
the block potentially takes over ownership. This includes MemRefs provided as
|
||||||
|
basic block arguments, interface handlers for operations like `memref.alloc` and
|
||||||
|
`func.call`, but also liveness information in regions with multiple basic
|
||||||
|
blocks. More concretely, it is computed by taking the MemRefs in the 'in' set
|
||||||
|
of the liveness analysis of the current basic block B, appended by the MemRef
|
||||||
|
block arguments and by the set of MemRefs allocated in B itself (determined by
|
||||||
|
the interface handlers), then subtracted (also determined by the interface
|
||||||
|
handlers) by the set of MemRefs deallocated in B.
|
||||||
|
|
||||||
|
Note that we don't have to take the intersection of the liveness 'in' set with
|
||||||
|
the 'out' set of the predecessor block because a value that is in the 'in' set
|
||||||
|
must be defined in an ancestor block that dominates all direct predecessors and
|
||||||
|
thus the 'in' set of this block is a subset of the 'out' sets of each
|
||||||
|
predecessor.
|
||||||
|
|
||||||
|
```
|
||||||
|
memrefs = filter((liveIn(block) U
|
||||||
|
allocated(block) U arguments(block)) \ deallocated(block), isMemRef)
|
||||||
|
```
|
||||||
|
|
||||||
|
The list of conditions for the second variadic operands list of
|
||||||
|
`bufferization.dealloc` is computed by querying the stored ownership value for
|
||||||
|
each of the MemRefs collected as described above. The ownership state is updated
|
||||||
|
by the interface handlers while processing the basic block.
|
||||||
|
|
||||||
|
**Collecting the list of MemRefs to retain**
|
||||||
|
|
||||||
|
Given a basic block B, the list of MemRefs that have to be retained can be
|
||||||
|
different for each successor block S. For the two basic blocks B and S and the
|
||||||
|
values passed via block arguments to the destination block S, we compute the
|
||||||
|
list of MemRefs that have to be retained in B by taking the MemRefs in the
|
||||||
|
successor operand list of the terminator and the MemRefs in the 'out' set of the
|
||||||
|
liveness analysis for B intersected with the 'in' set of the destination block
|
||||||
|
S.
|
||||||
|
|
||||||
|
This list of retained values makes sure that we cannot run into use-after-free
|
||||||
|
situations even if no aliasing information is present at compile-time.
|
||||||
|
|
||||||
|
```
|
||||||
|
toRetain = filter(successorOperands + (liveOut(fromBlock) insersect
|
||||||
|
liveIn(toBlock)), isMemRef)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Supported interfaces
|
||||||
|
|
||||||
|
The pass uses liveness analysis and a few interfaces:
|
||||||
|
* `FunctionOpInterface`
|
||||||
|
* `CallOpInterface`
|
||||||
|
* `MemoryEffectOpInterface`
|
||||||
|
* `RegionBranchOpInterface`
|
||||||
|
* `RegionBranchTerminatorOpInterface`
|
||||||
|
|
||||||
|
Due to insufficient information provided by the interface, it also special-cases
|
||||||
|
on the `cf.cond_br` operation and makes some assumptions about operations
|
||||||
|
implementing the `RegionBranchOpInterface` at the moment, but improving the
|
||||||
|
interfaces would allow us to remove those dependencies in the future.
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
The Buffer Deallocation pass has some requirements and limitations on the input
|
||||||
|
IR. These are checked in the beginning of the pass and errors are emitted
|
||||||
|
accordingly:
|
||||||
|
* The set of interfaces the pass operates on must be implemented (correctly).
|
||||||
|
E.g., if there is an operation present with a nested region, but does not
|
||||||
|
implement the `RegionBranchOpInterface`, an error is emitted because the
|
||||||
|
pass cannot know the semantics of the nested region (and does not make any
|
||||||
|
default assumptions on it).
|
||||||
|
* No explicit control-flow loops are present. Currently, only loops using
|
||||||
|
structural-control-flow are supported. However, this limitation could be
|
||||||
|
lifted in the future.
|
||||||
|
* Deallocation operations should not be present already. The pass should
|
||||||
|
handle them correctly already (at least in most cases), but it's not
|
||||||
|
supported yet due to insufficient testing.
|
||||||
|
* Terminators must implement either `RegionBranchTerminatorOpInterface` or
|
||||||
|
`BranchOpInterface`, but not both. Terminators with more than one successor
|
||||||
|
are not supported (except `cf.cond_br`). This is not a fundamental
|
||||||
|
limitation, but there is no use-case justifying the more complex
|
||||||
|
implementation at the moment.
|
||||||
|
|
||||||
|
## Example
|
||||||
|
|
||||||
|
The following example contains a few interesting cases:
|
||||||
|
* Basic block arguments are modified to also pass along the ownership
|
||||||
|
indicator, but not for entry blocks, where the function boundary ABI
|
||||||
|
is applied instead.
|
||||||
|
* The result of `arith.select` initially has 'Unknown' assigned as ownership,
|
||||||
|
but once the `bufferization.dealloc` operation is inserted it is put in the
|
||||||
|
'retained' list (since it has uses in a later basic block) and thus the
|
||||||
|
'Unknown' ownership can be replaced with a 'Unique' ownership using the
|
||||||
|
corresponding result of the dealloc operation.
|
||||||
|
* The `cf.cond_br` operation has more than one successor and thus has to
|
||||||
|
insert two `bufferization.dealloc` operations (one for each successor).
|
||||||
|
While they have the same list of MemRefs to deallocate (because they perform
|
||||||
|
the deallocations for the same block), it must be taken into account that
|
||||||
|
some MemRefs remain *live* for one branch but not the other (thus set
|
||||||
|
intersection is performed on the *live-out* of the current block and the
|
||||||
|
*live-in* of the target block). Also, `cf.cond_br` supports separate
|
||||||
|
forwarding operands for each successor. To make sure that no MemRef is
|
||||||
|
deallocated twice (because there are two `bufferization.dealloc` operations
|
||||||
|
with the same MemRefs to deallocate), the condition operands are adjusted to
|
||||||
|
take the branch condition into account. While a generic lowering for such
|
||||||
|
terminator operations could be implemented, a specialized implementation can
|
||||||
|
take all the semantics of this particular operation into account and thus
|
||||||
|
generate a more efficient lowering.
|
||||||
|
|
||||||
|
```mlir
|
||||||
|
func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
|
||||||
|
%alloc = memref.alloc() : memref<?xi8>
|
||||||
|
%alloca = memref.alloca() : memref<?xi8>
|
||||||
|
%select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
|
||||||
|
cf.cond_br %br_cond, ^bb1(%alloc : memref<?xi8>), ^bb1(%memref : memref<?xi8>)
|
||||||
|
^bb1(%bbarg: memref<?xi8>):
|
||||||
|
test.copy(%bbarg, %select) : (memref<?xi8>, memref<?xi8>)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
After running `--ownership-based-buffer-deallocation`, it looks as follows:
|
||||||
|
|
||||||
|
```mlir
|
||||||
|
// Function boundary ABI: ownership of `%memref` will never be acquired.
|
||||||
|
func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
|
||||||
|
%false = arith.constant false
|
||||||
|
%true = arith.constant true
|
||||||
|
|
||||||
|
// The ownership of a MemRef defined by the `memref.alloc` operation is always
|
||||||
|
// assigned to be 'true'.
|
||||||
|
%alloc = memref.alloc() : memref<?xi8>
|
||||||
|
|
||||||
|
// The ownership of a MemRef defined by the `memref.alloca` operation is
|
||||||
|
// always assigned to be 'false'.
|
||||||
|
%alloca = memref.alloca() : memref<?xi8>
|
||||||
|
|
||||||
|
// The ownership of %select will be the join of the ownership of %alloc and
|
||||||
|
// the ownership of %alloca, i.e., of %true and %false. Because the pass does
|
||||||
|
// not know about the semantics of the `arith.select` operation (unless a
|
||||||
|
// custom handler is implemented), the ownership join will be 'Unknown'. If
|
||||||
|
// the materialized ownership indicator of %select is needed, either a clone
|
||||||
|
// has to be created for which %true is assigned as ownership or the result
|
||||||
|
// of a `bufferization.dealloc` where %select is in the retain list has to be
|
||||||
|
// used.
|
||||||
|
%select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
|
||||||
|
|
||||||
|
// We use `memref.extract_strided_metadata` to get the base memref since it is
|
||||||
|
// not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is
|
||||||
|
// already enforced for `bufferization.dealloc`
|
||||||
|
%base_buffer_memref, ... = memref.extract_strided_metadata %memref
|
||||||
|
: memref<?xi8> -> memref<i8>, index, index, index
|
||||||
|
%base_buffer_alloc, ... = memref.extract_strided_metadata %alloc
|
||||||
|
: memref<?xi8> -> memref<i8>, index, index, index
|
||||||
|
%base_buffer_alloca, ... = memref.extract_strided_metadata %alloca
|
||||||
|
: memref<?xi8> -> memref<i8>, index, index, index
|
||||||
|
|
||||||
|
// The deallocation conditions need to be adjusted to incorporate the branch
|
||||||
|
// condition. In this example, this requires only a single negation, but might
|
||||||
|
// also require multiple arith.andi operations.
|
||||||
|
%not_br_cond = arith.xori %true, %br_cond : i1
|
||||||
|
|
||||||
|
// There are two dealloc operations inserted in this basic block, one per
|
||||||
|
// successor. Both have the same list of MemRefs to deallocate and the
|
||||||
|
// conditions only differ by the branch condition conjunct.
|
||||||
|
// Note, however, that the retained list differs. Here, both contain the
|
||||||
|
// %select value because it is used in both successors (since it's the same
|
||||||
|
// block), but the value passed via block argument differs (%memref vs.
|
||||||
|
// %alloc).
|
||||||
|
%10:2 = bufferization.dealloc
|
||||||
|
(%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
|
||||||
|
: memref<i8>, memref<i8>, memref<i8>)
|
||||||
|
if (%false, %br_cond, %false)
|
||||||
|
retain (%alloc, %select : memref<?xi8>, memref<?xi8>)
|
||||||
|
|
||||||
|
%11:2 = bufferization.dealloc
|
||||||
|
(%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
|
||||||
|
: memref<i8>, memref<i8>, memref<i8>)
|
||||||
|
if (%false, %not_br_cond, %false)
|
||||||
|
retain (%memref, %select : memref<?xi8>, memref<?xi8>)
|
||||||
|
|
||||||
|
// Because %select is used in ^bb1 without passing it via block argument, we
|
||||||
|
// need to update it's ownership value here by merging the ownership values
|
||||||
|
// returned by the dealloc operations
|
||||||
|
%new_ownership = arith.select %br_cond, %10#1, %11#1 : i1
|
||||||
|
|
||||||
|
// The terminator is modified to pass along the ownership indicator values
|
||||||
|
// with each MemRef value.
|
||||||
|
cf.cond_br %br_cond, ^bb1(%alloc, %10#0 : memref<?xi8>, i1),
|
||||||
|
^bb1(%memref, %11#0 : memref<?xi8>, i1)
|
||||||
|
|
||||||
|
// All non-entry basic blocks are modified to have an additional i1 argument for
|
||||||
|
// each MemRef value in the argument list.
|
||||||
|
^bb1(%13: memref<?xi8>, %14: i1): // 2 preds: ^bb0, ^bb0
|
||||||
|
test.copy(%13, %select) : (memref<?xi8>, memref<?xi8>)
|
||||||
|
|
||||||
|
%base_buffer_13, ... = memref.extract_strided_metadata %13
|
||||||
|
: memref<?xi8> -> memref<i8>, index, index, index
|
||||||
|
%base_buffer_select, ... = memref.extract_strided_metadata %select
|
||||||
|
: memref<?xi8> -> memref<i8>, index, index, index
|
||||||
|
|
||||||
|
// Here, we don't have a retained list, because the block has no successors
|
||||||
|
// and the return has no operands.
|
||||||
|
bufferization.dealloc (%base_buffer_13, %base_buffer_select
|
||||||
|
: memref<i8>, memref<i8>)
|
||||||
|
if (%14, %new_ownership)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Buffer Deallocation Simplification Pass
|
||||||
|
|
||||||
|
The [semantics of the `bufferization.dealloc` operation](#bufferizationdealloc-bufferizationdeallocop)
|
||||||
|
provide a lot of opportunities for optimizations which can be conveniently split
|
||||||
|
into patterns using the greedy pattern rewriter. Some of those patterns need
|
||||||
|
access to additional analyses such as an analysis that can determine whether two
|
||||||
|
MemRef values must, may, or never originate from the same buffer allocation.
|
||||||
|
These patterns are collected in the Buffer Deallocation Simplification pass,
|
||||||
|
while patterns that don't need additional analyses are registered as part of the
|
||||||
|
regular canonicalizer pass. This pass is best run after
|
||||||
|
`--ownership-based-buffer-deallocation` followed by `--canonicalize`.
|
||||||
|
|
||||||
|
The pass applies patterns for the following simplifications:
|
||||||
|
* Remove MemRefs from retain list when guaranteed to not alias with any value
|
||||||
|
in the 'memref' operand list. This avoids an additional aliasing check with
|
||||||
|
the removed value.
|
||||||
|
* Split off values in the 'memref' list to new `bufferization.dealloc`
|
||||||
|
operations only containing this value in the 'memref' list when it is
|
||||||
|
guaranteed to not alias with any other value in the 'memref' list. This
|
||||||
|
avoids at least one aliasing check at runtime and enables using a more
|
||||||
|
efficient lowering for this new `bufferization.dealloc` operation.
|
||||||
|
* Remove values from the 'memref' operand list when it is guaranteed to alias
|
||||||
|
with at least one value in the 'retained' list and may not alias any other
|
||||||
|
value in the 'retain' list.
|
||||||
|
|
||||||
|
## Lower Deallocations Pass
|
||||||
|
|
||||||
|
The `-lower-deallocations` pass transforms all `bufferization.dealloc`
|
||||||
|
operations to `memref.dealloc` operations and may also insert operations from
|
||||||
|
the `scf`, `func`, and `arith` dialects to make deallocations conditional and
|
||||||
|
check whether two MemRef values come from the same allocation at runtime (when
|
||||||
|
the `buffer-deallocation-simplification` pass wasn't able to determine it
|
||||||
|
statically).
|
||||||
|
|
||||||
|
The same lowering of the `bufferization.dealloc` operation is also part of the
|
||||||
|
`-convert-bufferization-to-memref` conversion pass which also lowers all the
|
||||||
|
other operations of the bufferization dialect.
|
||||||
|
|
||||||
|
We distinguish multiple cases in this lowering pass to provide an overall more
|
||||||
|
efficient lowering. In the general case, a library function is created to avoid
|
||||||
|
quadratic code size explosion (relative to the number of operands of the dealloc
|
||||||
|
operation). The specialized lowerings aim to avoid this library function because
|
||||||
|
it requires allocating auxiliary MemRefs of index values.
|
||||||
|
|
||||||
|
### Generic Lowering
|
||||||
|
|
||||||
|
A library function is generated to avoid code-size blow-up. On a high level, the
|
||||||
|
base-memref of all operands is extracted as an index value and stored into
|
||||||
|
specifically allocated MemRefs and passed to the library function which then
|
||||||
|
determines whether they come from the same original allocation. This information
|
||||||
|
is needed to avoid double-free situations and to correctly retain the MemRef
|
||||||
|
values in the `retained` list.
|
||||||
|
|
||||||
|
**Dealloc Operation Lowering**
|
||||||
|
|
||||||
|
This lowering supports all features the dealloc operation has to offer. It
|
||||||
|
computes the base pointer of each memref (as an index), stores it in a
|
||||||
|
new memref helper structure and passes it to the helper function generated
|
||||||
|
in `buildDeallocationLibraryFunction`. The results are stored in two lists
|
||||||
|
(represented as MemRefs) of booleans passed as arguments. The first list
|
||||||
|
stores whether the corresponding condition should be deallocated, the
|
||||||
|
second list stores the ownership of the retained values which can be used
|
||||||
|
to replace the result values of the `bufferization.dealloc` operation.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```mlir
|
||||||
|
%0:2 = bufferization.dealloc (%m0, %m1 : memref<2xf32>, memref<5xf32>)
|
||||||
|
if (%cond0, %cond1)
|
||||||
|
retain (%r0, %r1 : memref<1xf32>, memref<2xf32>)
|
||||||
|
```
|
||||||
|
lowers to (simplified):
|
||||||
|
```mlir
|
||||||
|
%c0 = arith.constant 0 : index
|
||||||
|
%c1 = arith.constant 1 : index
|
||||||
|
%dealloc_base_pointer_list = memref.alloc() : memref<2xindex>
|
||||||
|
%cond_list = memref.alloc() : memref<2xi1>
|
||||||
|
%retain_base_pointer_list = memref.alloc() : memref<2xindex>
|
||||||
|
%m0_base_pointer = memref.extract_aligned_pointer_as_index %m0
|
||||||
|
memref.store %m0_base_pointer, %dealloc_base_pointer_list[%c0]
|
||||||
|
%m1_base_pointer = memref.extract_aligned_pointer_as_index %m1
|
||||||
|
memref.store %m1_base_pointer, %dealloc_base_pointer_list[%c1]
|
||||||
|
memref.store %cond0, %cond_list[%c0]
|
||||||
|
memref.store %cond1, %cond_list[%c1]
|
||||||
|
%r0_base_pointer = memref.extract_aligned_pointer_as_index %r0
|
||||||
|
memref.store %r0_base_pointer, %retain_base_pointer_list[%c0]
|
||||||
|
%r1_base_pointer = memref.extract_aligned_pointer_as_index %r1
|
||||||
|
memref.store %r1_base_pointer, %retain_base_pointer_list[%c1]
|
||||||
|
%dyn_dealloc_base_pointer_list = memref.cast %dealloc_base_pointer_list :
|
||||||
|
memref<2xindex> to memref<?xindex>
|
||||||
|
%dyn_cond_list = memref.cast %cond_list : memref<2xi1> to memref<?xi1>
|
||||||
|
%dyn_retain_base_pointer_list = memref.cast %retain_base_pointer_list :
|
||||||
|
memref<2xindex> to memref<?xindex>
|
||||||
|
%dealloc_cond_out = memref.alloc() : memref<2xi1>
|
||||||
|
%ownership_out = memref.alloc() : memref<2xi1>
|
||||||
|
%dyn_dealloc_cond_out = memref.cast %dealloc_cond_out :
|
||||||
|
memref<2xi1> to memref<?xi1>
|
||||||
|
%dyn_ownership_out = memref.cast %ownership_out :
|
||||||
|
memref<2xi1> to memref<?xi1>
|
||||||
|
call @dealloc_helper(%dyn_dealloc_base_pointer_list,
|
||||||
|
%dyn_retain_base_pointer_list,
|
||||||
|
%dyn_cond_list,
|
||||||
|
%dyn_dealloc_cond_out,
|
||||||
|
%dyn_ownership_out) : (...)
|
||||||
|
%m0_dealloc_cond = memref.load %dyn_dealloc_cond_out[%c0] : memref<2xi1>
|
||||||
|
scf.if %m0_dealloc_cond {
|
||||||
|
memref.dealloc %m0 : memref<2xf32>
|
||||||
|
}
|
||||||
|
%m1_dealloc_cond = memref.load %dyn_dealloc_cond_out[%c1] : memref<2xi1>
|
||||||
|
scf.if %m1_dealloc_cond {
|
||||||
|
memref.dealloc %m1 : memref<5xf32>
|
||||||
|
}
|
||||||
|
%r0_ownership = memref.load %dyn_ownership_out[%c0] : memref<2xi1>
|
||||||
|
%r1_ownership = memref.load %dyn_ownership_out[%c1] : memref<2xi1>
|
||||||
|
memref.dealloc %dealloc_base_pointer_list : memref<2xindex>
|
||||||
|
memref.dealloc %retain_base_pointer_list : memref<2xindex>
|
||||||
|
memref.dealloc %cond_list : memref<2xi1>
|
||||||
|
memref.dealloc %dealloc_cond_out : memref<2xi1>
|
||||||
|
memref.dealloc %ownership_out : memref<2xi1>
|
||||||
|
// replace %0#0 with %r0_ownership
|
||||||
|
// replace %0#1 with %r1_ownership
|
||||||
|
```
|
||||||
|
|
||||||
|
**Library function**
|
||||||
|
|
||||||
|
A library function is built per compilation unit that can be called at
|
||||||
|
bufferization dealloc sites to determine whether two MemRefs come from the same
|
||||||
|
allocation and their new ownerships.
|
||||||
|
|
||||||
|
The generated function takes two MemRefs of indices and three MemRefs of
|
||||||
|
booleans as arguments:
|
||||||
|
* The first argument A should contain the result of the
|
||||||
|
extract_aligned_pointer_as_index operation applied to the MemRefs to be
|
||||||
|
deallocated
|
||||||
|
* The second argument B should contain the result of the
|
||||||
|
extract_aligned_pointer_as_index operation applied to the MemRefs to be
|
||||||
|
retained
|
||||||
|
* The third argument C should contain the conditions as passed directly
|
||||||
|
to the deallocation operation.
|
||||||
|
* The fourth argument D is used to pass results to the caller. Those
|
||||||
|
represent the condition under which the MemRef at the corresponding
|
||||||
|
position in A should be deallocated.
|
||||||
|
* The fifth argument E is used to pass results to the caller. It
|
||||||
|
provides the ownership value corresponding the the MemRef at the same
|
||||||
|
position in B
|
||||||
|
|
||||||
|
This helper function is supposed to be called once for each
|
||||||
|
`bufferization.dealloc` operation to determine the deallocation need and
|
||||||
|
new ownership indicator for the retained values, but does not perform the
|
||||||
|
deallocation itself.
|
||||||
|
|
||||||
|
Generated code:
|
||||||
|
```mlir
|
||||||
|
func.func @dealloc_helper(
|
||||||
|
%dyn_dealloc_base_pointer_list: memref<?xindex>,
|
||||||
|
%dyn_retain_base_pointer_list: memref<?xindex>,
|
||||||
|
%dyn_cond_list: memref<?xi1>,
|
||||||
|
%dyn_dealloc_cond_out: memref<?xi1>,
|
||||||
|
%dyn_ownership_out: memref<?xi1>) {
|
||||||
|
%c0 = arith.constant 0 : index
|
||||||
|
%c1 = arith.constant 1 : index
|
||||||
|
%true = arith.constant true
|
||||||
|
%false = arith.constant false
|
||||||
|
%num_dealloc_memrefs = memref.dim %dyn_dealloc_base_pointer_list, %c0
|
||||||
|
%num_retain_memrefs = memref.dim %dyn_retain_base_pointer_list, %c0
|
||||||
|
// Zero initialize result buffer.
|
||||||
|
scf.for %i = %c0 to %num_retain_memrefs step %c1 {
|
||||||
|
memref.store %false, %dyn_ownership_out[%i] : memref<?xi1>
|
||||||
|
}
|
||||||
|
scf.for %i = %c0 to %num_dealloc_memrefs step %c1 {
|
||||||
|
%dealloc_bp = memref.load %dyn_dealloc_base_pointer_list[%i]
|
||||||
|
%cond = memref.load %dyn_cond_list[%i]
|
||||||
|
// Check for aliasing with retained memrefs.
|
||||||
|
%does_not_alias_retained = scf.for %j = %c0 to %num_retain_memrefs
|
||||||
|
step %c1 iter_args(%does_not_alias_aggregated = %true) -> (i1) {
|
||||||
|
%retain_bp = memref.load %dyn_retain_base_pointer_list[%j]
|
||||||
|
%does_alias = arith.cmpi eq, %retain_bp, %dealloc_bp : index
|
||||||
|
scf.if %does_alias {
|
||||||
|
%curr_ownership = memref.load %dyn_ownership_out[%j]
|
||||||
|
%updated_ownership = arith.ori %curr_ownership, %cond : i1
|
||||||
|
memref.store %updated_ownership, %dyn_ownership_out[%j]
|
||||||
|
}
|
||||||
|
%does_not_alias = arith.cmpi ne, %retain_bp, %dealloc_bp : index
|
||||||
|
%updated_aggregate = arith.andi %does_not_alias_aggregated,
|
||||||
|
%does_not_alias : i1
|
||||||
|
scf.yield %updated_aggregate : i1
|
||||||
|
}
|
||||||
|
// Check for aliasing with dealloc memrefs in the list before the
|
||||||
|
// current one, i.e.,
|
||||||
|
// `fix i, forall j < i: check_aliasing(%dyn_dealloc_base_pointer[j],
|
||||||
|
// %dyn_dealloc_base_pointer[i])`
|
||||||
|
%does_not_alias_any = scf.for %j = %c0 to %i step %c1
|
||||||
|
iter_args(%does_not_alias_agg = %does_not_alias_retained) -> (i1) {
|
||||||
|
%prev_dealloc_bp = memref.load %dyn_dealloc_base_pointer_list[%j]
|
||||||
|
%does_not_alias = arith.cmpi ne, %prev_dealloc_bp, %dealloc_bp
|
||||||
|
%updated_alias_agg = arith.andi %does_not_alias_agg, %does_not_alias
|
||||||
|
scf.yield %updated_alias_agg : i1
|
||||||
|
}
|
||||||
|
%dealloc_cond = arith.andi %does_not_alias_any, %cond : i1
|
||||||
|
memref.store %dealloc_cond, %dyn_dealloc_cond_out[%i] : memref<?xi1>
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Specialized Lowerings
|
||||||
|
|
||||||
|
Currently, there are two special lowerings for common cases to avoid the library
|
||||||
|
function and thus unnecessary memory load and store operations and function
|
||||||
|
calls:
|
||||||
|
|
||||||
|
**One memref, no retained**
|
||||||
|
|
||||||
|
Lower a simple case without any retained values and a single MemRef. Ideally,
|
||||||
|
static analysis can provide enough information such that the
|
||||||
|
`buffer-deallocation-simplification` pass is able to split the dealloc
|
||||||
|
operations up into this simple case as much as possible before running this
|
||||||
|
pass.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```mlir
|
||||||
|
bufferization.dealloc (%arg0 : memref<2xf32>) if (%arg1)
|
||||||
|
```
|
||||||
|
is lowered to
|
||||||
|
```mlir
|
||||||
|
scf.if %arg1 {
|
||||||
|
memref.dealloc %arg0 : memref<2xf32>
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
In most cases, the branch condition is either constant 'true' or 'false' and can
|
||||||
|
thus be optimized away entirely by the canonicalizer pass.
|
||||||
|
|
||||||
|
**One memref, arbitrarily many retained**
|
||||||
|
|
||||||
|
A special case lowering for the deallocation operation with exactly one MemRef,
|
||||||
|
but an arbitrary number of retained values. The size of the code produced by
|
||||||
|
this lowering is linear to the number of retained values.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```mlir
|
||||||
|
%0:2 = bufferization.dealloc (%m : memref<2xf32>) if (%cond)
|
||||||
|
retain (%r0, %r1 : memref<1xf32>, memref<2xf32>)
|
||||||
|
return %0#0, %0#1 : i1, i1
|
||||||
|
```
|
||||||
|
is lowered to
|
||||||
|
```mlir
|
||||||
|
%m_base_pointer = memref.extract_aligned_pointer_as_index %m
|
||||||
|
%r0_base_pointer = memref.extract_aligned_pointer_as_index %r0
|
||||||
|
%r0_does_not_alias = arith.cmpi ne, %m_base_pointer, %r0_base_pointer
|
||||||
|
%r1_base_pointer = memref.extract_aligned_pointer_as_index %r1
|
||||||
|
%r1_does_not_alias = arith.cmpi ne, %m_base_pointer, %r1_base_pointer
|
||||||
|
%not_retained = arith.andi %r0_does_not_alias, %r1_does_not_alias : i1
|
||||||
|
%should_dealloc = arith.andi %not_retained, %cond : i1
|
||||||
|
scf.if %should_dealloc {
|
||||||
|
memref.dealloc %m : memref<2xf32>
|
||||||
|
}
|
||||||
|
%true = arith.constant true
|
||||||
|
%r0_does_alias = arith.xori %r0_does_not_alias, %true : i1
|
||||||
|
%r0_ownership = arith.andi %r0_does_alias, %cond : i1
|
||||||
|
%r1_does_alias = arith.xori %r1_does_not_alias, %true : i1
|
||||||
|
%r1_ownership = arith.andi %r1_does_alias, %cond : i1
|
||||||
|
return %r0_ownership, %r1_ownership : i1, i1
|
||||||
|
```
|
1
mlir/docs/includes/img/bufferization_dealloc_op.svg
Normal file
1
mlir/docs/includes/img/bufferization_dealloc_op.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 420 KiB |
@ -24,8 +24,8 @@ def Bufferization_Dialect : Dialect {
|
|||||||
|
|
||||||
An overview of the bufferization infrastructure and important conceptual
|
An overview of the bufferization infrastructure and important conceptual
|
||||||
details related to using the MLIR dialect conversion infrastructure can be
|
details related to using the MLIR dialect conversion infrastructure can be
|
||||||
found in [bufferization](/docs/Bufferization/) and [buffer
|
found in [bufferization](/docs/Bufferization/) and [ownership-based buffer
|
||||||
deallocation](/docs/BufferDeallocationInternals/).
|
deallocation](/docs/OwnershipBasedBufferDeallocation/).
|
||||||
}];
|
}];
|
||||||
let dependentDialects = [
|
let dependentDialects = [
|
||||||
"affine::AffineDialect", "memref::MemRefDialect", "tensor::TensorDialect",
|
"affine::AffineDialect", "memref::MemRefDialect", "tensor::TensorDialect",
|
||||||
|
Loading…
x
Reference in New Issue
Block a user