## Summary
Fixes crash when running `--eliminate-empty-tensors` on MLIR modules
containing `sparse_tensor.new` with a tensor input.
## Problem
The `sparse_tensor.new` operation was missing the
`bufferizesToMemoryRead`, `bufferizesToMemoryWrite`, and
`getAliasingValues` methods in its `BufferizableOpInterface`
implementation. This caused an `UNREACHABLE` crash with message
"bufferizesToMemoryRead not implemented".
## Solution
Implemented the missing methods in `NewOpInterface`:
- `bufferizesToMemoryRead`: returns `true` (reads from source tensor)
- `bufferizesToMemoryWrite`: returns `false` (doesn't write to source)
- `getAliasingValues`: returns empty (result is new allocation, not an
alias)
Also added a test case for `sparse_tensor.new` with tensor input.
Fixes#178419
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit simplifies the design of the `RegionBranchOpInterface`. The
property of being a successor input is now independent of the region
branch point.
There is a new API for querying successor inputs:
`RegionBranchOpInterface::getSuccessorInputs(RegionSuccessor)`. Note
that this function does **not** take a `RegionBranchPoint` as parameter.
The `RegionSuccessor` API is now also simpler: it no longer stores
successor inputs. A region successor is simply `Region *`, wrapped
around a convenience API.
Note: This commit is mostly mechanical. Analyses / transformations that
build on top of the `RegionBranchOpInterface` (e.g.,
`visitNonControlFlowArguments` API) can likely be simplified in
follow-up commits.
Note for LLVM integration: Split
`RegionBranchOpInterface::getSuccessorRegion` implementations into two
functions: `getSuccessorRegion` and `getSuccessorInputs. (There are many
examples in this commit.)
RFC:
https://discourse.llvm.org/t/rfc-simplify-regionbranchopinterface-separate-successor-inputs-from-region-successor/89420/7
Simplify the design of `RegionSuccessor`. There is no need to store the
`Operation *` pointer when branching out of the region branch op (to the
parent). There is no API to even access the `Operation *` pointer.
Add a new helper function `RegionSuccessor::parent` to construct a
region successor that points to the parent. This aligns the
`RegionSuccessor` design and API with `RegionBranchPoint`:
* Both classes now have a `parent()` helper function.
`ClassName::parent()` can be used in documentation to precisely describe
the source/target of a region branch.
* Both classes now use `nullptr` internally to represent "parent".
This API change also protects against incorrect API usage: users can no
longer pass an incorrect parent op. If a region successor is not a
region of the region branch op, it *must* branch out of region branch op
itself ("parent"). However, the previous API allowed passing other
operations. There was one such API violation in a [test
case](https://github.com/llvm/llvm-project/pull/174945/files#diff-d5717e4a8d7344b2ff77762b8fa480bcfec0eeee97a86195c787d791a6217e13L71).
Also clean up the documentation to use the correct terminology (such as
"successor operands", "successor inputs") consistently.
Note: This PR effectively rolls back some changes from #161575. That PR
introduced `llvm::PointerUnion<Region *, Operation *>
successor{nullptr};`. It is unclear from the commit message why that
change was made.
Note for LLVM integration: You may have to slightly modify
`getSuccessorRegion` implementations: Replace
`RegionSuccessor(getOperation(), getOperation()->getResults())` with
`RegionSuccessor::parent(getResults())`.
Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
This PR builds upon the infrastructure set up for Sparse Tensor Loop
Ordering Heuristics (#154656) by adding a preference to have dense loops
outer and sparse loops inner.
As always I'd love to get feedback and know if there's any other
direction to go with this work that might be better.
This PR makes the following improvements to `vector.scatter` and its
lowering pipeline:
- In addition to `memref`, accept a ranked `tensor` as the base operand
of `vector.scatter`, similar to `vector.transfer_write`.
- Implement bufferization support for `vector.scatter`, so that
tensor-based scatter ops can be fully lowered to memref-based forms.
It's worth to complete the functionality of map_scatter decomposition.
Full discussion can be found here:
https://github.com/iree-org/iree/issues/21135
---------
Signed-off-by: Ryutaro Okada <1015ryu88@gmail.com>
When we create a `SparseIterator`, we sometimes wrap it in a
`FilterIterator`, which delegates _some_ calls to the underlying
`SparseIterator`.
After construction, e.g. in `makeNonEmptySubSectIterator()`, we call
`setSparseEmitStrategy()`. This sets the strategy only in one of the
filters -- if we call `setSparseEmitStrategy()` immediately after
creating the `SparseIterator`, then the wrapped `SparseIterator` will
have the right strategy, and the `FilterIterator` strategy will be
unintialized; if we call `setSparseEmitStrategy()` after wrapping the
iterator in `FilterIterator`, then the opposite happens.
If we make `setSparseEmitStrategy()` a virtual method so that it's
included in the `FilterIterator` pattern, and then do all reads of
`emitStrategy` via a virtual method as well, it's pretty simple to
ensure that the value of `strategy` is being set consistently and
correctly.
Without this, the UB of strategy being uninitialized manifests as a
sporadic test failure in
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_strided_conv_2d_nhwc_hwcf.mlir,
when run downstream with the right flags (e.g. asan + assertions off).
The test sometimes fails with `ne_sub<trivial<dense[0,1]>>.begin' op
created with unregistered dialect`. It can also be directly observed w/
msan that this uninitialized read is the cause of that issue, but msan
causes other problems w/ this test.
1. In `Transforms.cpp` the debug macro is accessing a SmallVector
variable that has been moved-from and reset. Fixed by reordering code
for the move-from to happen last.
2. `IterationGraphSorter` Refine the previous use-after-move fix for
style/readability by renaming the private constructor args to resolve
naming ambiguity with the class members.
Testing: `ninja check-mlir`
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`
Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.
It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.
Try to reland #161575 ; I suspect a buildbot incremental build issue.
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`
Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.
It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.
These issues affect only Debug builds, and Release builds with asserts
enabled.
1. In `SparseTensor.h` a variable is moved-from within an assert,
introducing a side effect that alters its subsequent use, and causes
divergence between Debug and Release builds (with asserts disabled).
2. In `IterationGraphSorter.cpp`, the class constructor arguments are
moved-from to initialize class member variables via the initializer
list. Because both the arguments and class members are identically
named, there's a naming collision where the arguments shadow their
identically-named member variables counterparts inside the constructor
body. In the original code, unqualified names inside the asserts,
referred to the constructor arguments. This is wrong, because these have
already been moved-from. It's not just a UB, but is broken. These
SmallVector types when moved-from are reset i.e. the size resets to 0.
This actually renders the affected asserts ineffective, since the
comparisons operate on two hollowed-out objects and always succeed. This
name ambiguity is fixed by using 'this->' to correctly refer to the
initialized member variables carrying the relevant state.
3. While the fix 2 above made the asserts act as intended, it also
unexpectedly broke one mlir test: `llvm-lit -v
mlir/test/Dialect/SparseTensor/sparse_scalars.mlir` This required fixing
the assert logic itself, which likely has never worked and went
unnoticed all this time due to the bug 2. Specifically, in the failing
test that uses `mlir/test/Dialect/SparseTensor/sparse_scalars.mlir` the
'%argq' of 'ins' is defined as 'f32' scalar type, but the original code
inside the assert had no support for scalar types as written, and was
breaking the test.
Testing:
```
ninja check-mlir
llvm-lit -v mlir/test/Dialect/SparseTensor/sparse_scalars.mlir
```
As discussed before, this PR adds the basic infrastructure/boiler plate
for loop ordering strategies to be implemented.
If this looks ok, I wanted to also mention some of the heuristics that I
would implement next, if they sound reasonable to you guys:
- Parallel first : prioritize parallel loops over reduction loops
- Dense outer : prioritize the most dense loops first
- Sparse outer : the opposite, potentially useful in some cases?
There is another that I am considering, stride/memory aware, which would
prioritize loops with better stride patterns (like sequential or
linear). Not sure how well this carries over to Sparse Tensor though.
Are there any ideas/heuristics that I should definitely try to
implement?
As we discussed, I will try to incrementally add heuristics. Sorry for
the delay on my end, and thank you so much for the feedback!
---------
Co-authored-by: Aart Bik <ajcbik@google.com>
```
warning: implicit capture of 'this' with a capture default of '=' is deprecated [-Wdeprecated-this-capture]
```
Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
This commit improves the `allowPatternRollback` flag handling in the
dialect conversion driver. Previously, this flag was used to merely
detect cases that are incompatible with the new One-Shot Dialect
Conversion driver. This commit implements the driver itself: when the
flag is set to "false", all IR changes are materialized immediately,
bypassing the `IRRewrite` and `ConversionValueMapping` infrastructure.
A few selected test cases now run with both the old and the new driver.
RFC:
https://discourse.llvm.org/t/rfc-a-new-one-shot-dialect-conversion-driver/79083
As part of 2646c36a864aa6a62bc1280e9a8cd2bcd2695349,
`OneShotModuleBufferize` no longer descends into nested symbol tables,
recommending users who wish to do this should do so in a pass
pipeline/custom pass. This did not support the use case of ops that
weren't ModuleOps. The patch updates `OneShotModuleBufferize` to work on
any general op.
This PR uses `val.getDefiningOp<OpTy>()` to replace `dyn_cast<OpTy>(val.getDefiningOp())` , `dyn_cast_or_null<OpTy>(val.getDefiningOp())` and `dyn_cast_if_present<OpTy>(val.getDefiningOp())`.
The motivation is to avoid having to negate `isDynamic*` checks, avoid
double negations, and allow for `ShapedType::isStaticDim` to be used in
ADT functions without having to wrap it in a lambda performing the
negation.
Also add the new functions to C and Python bindings.
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
One of the common uses of std::nullopt is in one of the constructors
for ValueRange. This patch takes care of the migration where we need
ValueRange() to facilitate perfect forwarding. Note that {} would be
ambiguous for perfecting forwarding to work.
Following the addition of TensorLike and BufferLike type interfaces (see
00eaff3e9c897c263a879416d0f151d7ca7eeaff), introduce minimal changes
required to bufferize a custom tensor operation into a custom buffer
operation.
To achieve this, new interface methods are added to TensorLike type
interface that abstract away the differences between existing (tensor ->
memref) and custom conversions.
The scope of the changes is intentionally limited (for example,
BufferizableOpInterface is untouched) in order to first understand the
basics and reach consensus design-wise.
---
Notable changes:
* mlir::bufferization::getBufferType() returns BufferLikeType (instead
of BaseMemRefType)
* ToTensorOp / ToBufferOp operate on TensorLikeType / BufferLikeType.
Operation argument "memref" renamed to "buffer"
* ToTensorOp's tensor type inferring builder is dropped (users now need
to provide the tensor type explicitly)
Generally, bufferization should be able to create a memref from a tensor
without needing to know more than just a mlir::Type. Thus, change
BufferizationOptions::UnknownTypeConverterFn to accept just a type
(mlir::TensorType for now) instead of mlir::Value. Additionally, apply
the same rationale to getMemRefType() helper function.
Both changes are prerequisites to enable custom types support in
one-shot bufferization.
The PR continues the work started in #141019 by adding the `BufferizationState` class also to the `getBufferType` and `resolveConflicts` interface methods, together with the additional support functions that are used throughout the bufferization infrastructure.
Follow-up on #138143, which was reverted due to a missing update a method signature (more specifically, the bufferization interface for `tensor::ConcatOp`) that was not catched before merging. The old PR description is reported in the next lines.
This PR is a follow-up on https://github.com/llvm/llvm-project/pull/138125, and adds a bufferization state class providing information about the IR. The information currently consists of a cached list of symbol tables, which aims to solve the quadratic scaling of the bufferization task with respect to the number of symbols. The PR breaks API compatibility: the bufferize method of the BufferizableOpInterface has been enriched with a reference to a BufferizationState object.
The bufferization state must be kept in a valid state by the interface implementations. For example, if an operation with the Symbol trait is inserted or replaced, its parent SymbolTable must be updated accordingly (see, for example, the bufferization of arith::ConstantOp, where the symbol table of the module gets the new global symbol inserted). Similarly, the invalidation of a symbol table must be performed if an operation with the SymbolTable trait is removed (this can be performed using the invalidateSymbolTable method, introduced in https://github.com/llvm/llvm-project/pull/138014).
Reverts llvm/llvm-project#138143
The PR for the BufferizationState is temporarily reverted due to API incompatibilities that have been initially missed during the update and were not catched by PR checks.
This PR is a follow-up on #138125, and adds a bufferization state class providing information about the IR. The information currently consists of a cached list of symbol tables, which aims to solve the quadratic scaling of the bufferization task with respect to the number of symbols. The PR breaks API compatibility: the `bufferize` method of the `BufferizableOpInterface` has been enriched with a reference to a `BufferizationState` object.
The bufferization state must be kept in a valid state by the interface implementations. For example, if an operation with the `Symbol` trait is inserted or replaced, its parent `SymbolTable` must be updated accordingly (see, for example, the bufferization of `arith::ConstantOp`, where the symbol table of the module gets the new global symbol inserted). Similarly, the invalidation of a symbol table must be performed if an operation with the `SymbolTable` trait is removed (this can be performed using the `invalidateSymbolTable` method, introduced in #138014).
The revision adds isOneInteger helper, and simplifies the existing code
with the two methods. It removes some lambda, which makes code cleaner.
For downstream users, you can update the code with the below script.
```bash
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.h
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.cpp
```
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>