The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.
What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.
**Note-1**: I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
**Note-2**: I fixed a bug found by @Dinistro in #97697 . The issue was
that, when looking for redundant arguments, I was not considering that
the block might have already some arguments. So the index (in the block
args list) of the i-th `newArgument` is `i+numOfOldArguments`.
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.
What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.
**Note**: this a rework of #96871 . I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.
What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.
**Note**: many tests are still not passing. But I wanted to submit the
code before changing all the tests (and probably adding a couple), so
that we can agree in principle on the algorithm/design.
This commit adds the `BufferOriginAnalysis`, which can be queried to
check if two buffer SSA values originate from the same allocation. This
new analysis is used in the buffer deallocation pass to fold away or
simplify `bufferization.dealloc` ops more aggressively.
The `BufferOriginAnalysis` is based on the `BufferViewFlowAnalysis`,
which collects buffer SSA value "same buffer" dependencies. E.g., given
IR such as:
```
%0 = memref.alloc()
%1 = memref.subview %0
%2 = memref.subview %1
```
The `BufferViewFlowAnalysis` will report the following "reverse"
dependencies (`resolveReverse`) for `%2`: {`%2`, `%1`, `%0`}. I.e., all
buffer SSA values in the reverse use-def chain that originate from the
same allocation as `%2`. The `BufferOriginAnalysis` is built on top of
that. It handles only simple cases at the moment and may conservatively
return "unknown" around certain IR with branches, memref globals and
function arguments.
This analysis enables additional simplifications during
`-buffer-deallocation-simplification`. In particular, "regular" scf.for
loop nests, that yield buffers (or reallocations thereof) in the same
order as they appear in the iter_args, are now handled much more
efficiently. Such IR patterns are generated by the sparse compiler.
This commit simplifies a helper function in the ownership-based buffer
deallocation pass. Fixes a potential double-free (depending on the
scheduling of patterns).
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Fixes a bug in `SplitDeallocWhenNotAliasingAnyOther`. This pattern used
to generate invalid IR (op dominance error). We never noticed this bug
in existing test cases because other patterns and/or foldings were
applied afterwards and those rewrites "fixed up" the IR again. (The bug
is visible when running `mlir-opt -debug`.) Also add additional comments
to the implementation and simplify the code a bit.
Apart from the fixed dominance error, this change is NFC. Without this
change, buffer deallocation tests will fail when running with #74270.
Values that are the result of buffer allocation ops are guaranteed to
*not* be the same allocation as block arguments of containing blocks.
This fact can be used to allow for more aggressive simplification of
`bufferization.dealloc` ops.
This new pattern allows us to simplify the dealloc result value (by replacing
it with a constant 'true') and to trim the 'memref' operand list when we know
that all retained memrefs alias with one in the 'memref' list that has a
constant 'true' condition. Because the conditions of aliasing memrefs are
combined by disjunction, we know that once a single constant 'true' value is in
the disjunction the remaining elements don't matter anymore. This complements
the RemoveDeallocMemrefsContainedInRetained pattern which removes values from
the 'memref' list when static information is available for all retained values
by also allowing to remove values in the presence of may-aliases, but under
above mentioned condition instead.
The BufferDeallocation pass often adds dealloc operations where the memref and
retain lists are the same and all conditions are 'true'. If the operands are
all function arguments, for example, they are always determined to may-alias
which renders the other patterns invalid, but the op could still be trivially
optimized away. It would even be enough to directly compare the two operand
lists and check the conditions are all constant 'true' (plus checking for the
extract_strided_metadata operation), but this pattern is a bit more general and
still works when there are additional memrefs in the 'memref' list that actually
have to be deallocated (e.g., see regression test).
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158518
We are allowed to remove any values from the `memref` list for which there is no
memref in the `retained` list with a may-alias relation. Before removing, we
just have to make sure that the corresponding op results for all retained
memrefs with must-alias relation are updated accordingly. This means, the the
condition operand has to become part of the disjunction the result value is
computed with.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158395
Add a pattern that splits one dealloc operation into multiple dealloc operation
depending on static aliasing information of the values in the `memref` operand
list. This reduces the total number of aliasing checks required at runtime and
can enable futher canonicalizations of the new and simplified dealloc
operations.
Depends on D157407
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157508
Adds a pattern that removes memrefs from the `retained` list which are
guaranteed to not alias any memref in the `memrefs` list. The corresponding
result value can be replaced with `false` in that case according to the
operation description.
When applied after BufferDeallocation, this can considerably reduce the
overhead that needs to be added during the lowering of the dealloc operation to
check for aliasing (especially when there is only one element in the `memref`
list and all `retained` values can be removed).
Depends on D157398
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157407
Adds a pass that can be run after buffer deallocation to simplify the deallocation operations.
In particular, there are patterns that need alias information and thus cannot be added as a regular canonicalization pattern.
This initial commit moves an incorrect canonicalization pattern from over to this new pass and fixes it by querying the alias analysis for the additional information it needs to be correct (there must not by any potential aliasing memref in the retain list other than the currently mached one).
Also, improves this pattern by considering the `extract_strided_metadata` operation which is inserted by the deallocation pass by default.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157398