This patch fixes `-Wreturn-type` warnings which happens if MLIR is built
with GCC compiler (11.5 is used for detecting)
Founded errors
```
build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp: In function ‘MlirGreedyRewriteStrictness mlirGreedyRewriteDriverConfigGetStrictness(MlirGreedyRewriteDriverConfig)’:
build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp:399:1: warning: control reaches end of non-void function [-Wreturn-type]
399 | }
| ^
build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp: In function ‘MlirGreedySimplifyRegionLevel mlirGreedyRewriteDriverConfigGetRegionSimplificationLevel(MlirGreedyRewriteDriverConfig)’:
build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp:414:1: warning: control reaches end of non-void function [-Wreturn-type]
414 | }
| ^
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp: In member function ‘mlir::Speculation::Speculatability mlir::gpu::SubgroupBroadcastOp::getSpeculatability()’:
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp:2522:1: warning: control reaches end of non-void function [-Wreturn-type]
2522 | }
| ^
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp: In member function ‘llvm::LogicalResult mlir::gpu::SubgroupBroadcastOp::verify()’:
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp:2537:1: warning: control reaches end of non-void function [-Wreturn-type]
2537 | }
| ^
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractToNeonPatterns.cpp: In member function ‘mlir::Value {anonymous}::VectorContractRewriter::createMMLA(mlir::PatternRewriter&, mlir::Location, mlir::Value, mlir::Value, mlir::Value)’:
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractToNeonPatterns.cpp:153:3: warning: control reaches end of non-void function [-Wreturn-type]
153 | }
| ^
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp: In function ‘std::pair<long int, long int> mlir::linalg::getFmrFromWinogradConv2DFmr(mlir::linalg::WinogradConv2DFmr)’:
build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp:3776:1: warning: control reaches end of non-void function [-Wreturn-type]
3776 | }
| ^
build/llvm-llvmorg-21.1.8/mlir/test/lib/Dialect/Test/TestOpDefs.cpp: In function ‘llvm::StringLiteral getVisibilityString(mlir::SymbolTable::Visibility)’:
build/llvm-llvmorg-21.1.8/mlir/test/lib/Dialect/Test/TestOpDefs.cpp:37:1: warning: control reaches end of non-void function [-Wreturn-type]
37 | }
| ^
```
Fix this build error, which is reported by some compilers after #175815:
```
error: operands to ?: have different types ‘mlir::Operation::result_range {aka mlir::ResultRange}’ and ‘mlir::ValueRange’
return successor.isParent() ? getOperation()->getResults() : ValueRange();
```
This commit simplifies the design of the `RegionBranchOpInterface`. The
property of being a successor input is now independent of the region
branch point.
There is a new API for querying successor inputs:
`RegionBranchOpInterface::getSuccessorInputs(RegionSuccessor)`. Note
that this function does **not** take a `RegionBranchPoint` as parameter.
The `RegionSuccessor` API is now also simpler: it no longer stores
successor inputs. A region successor is simply `Region *`, wrapped
around a convenience API.
Note: This commit is mostly mechanical. Analyses / transformations that
build on top of the `RegionBranchOpInterface` (e.g.,
`visitNonControlFlowArguments` API) can likely be simplified in
follow-up commits.
Note for LLVM integration: Split
`RegionBranchOpInterface::getSuccessorRegion` implementations into two
functions: `getSuccessorRegion` and `getSuccessorInputs. (There are many
examples in this commit.)
RFC:
https://discourse.llvm.org/t/rfc-simplify-regionbranchopinterface-separate-successor-inputs-from-region-successor/89420/7
Simplify the design of `RegionSuccessor`. There is no need to store the
`Operation *` pointer when branching out of the region branch op (to the
parent). There is no API to even access the `Operation *` pointer.
Add a new helper function `RegionSuccessor::parent` to construct a
region successor that points to the parent. This aligns the
`RegionSuccessor` design and API with `RegionBranchPoint`:
* Both classes now have a `parent()` helper function.
`ClassName::parent()` can be used in documentation to precisely describe
the source/target of a region branch.
* Both classes now use `nullptr` internally to represent "parent".
This API change also protects against incorrect API usage: users can no
longer pass an incorrect parent op. If a region successor is not a
region of the region branch op, it *must* branch out of region branch op
itself ("parent"). However, the previous API allowed passing other
operations. There was one such API violation in a [test
case](https://github.com/llvm/llvm-project/pull/174945/files#diff-d5717e4a8d7344b2ff77762b8fa480bcfec0eeee97a86195c787d791a6217e13L71).
Also clean up the documentation to use the correct terminology (such as
"successor operands", "successor inputs") consistently.
Note: This PR effectively rolls back some changes from #161575. That PR
introduced `llvm::PointerUnion<Region *, Operation *>
successor{nullptr};`. It is unclear from the commit message why that
change was made.
Note for LLVM integration: You may have to slightly modify
`getSuccessorRegion` implementations: Replace
`RegionSuccessor(getOperation(), getOperation()->getResults())` with
`RegionSuccessor::parent(getResults())`.
This PR fixes a bug in `arith::SelectOp::inferResultRangesFromOptional`
where uninitialized SelectOp branch int ranges were incorrectly joined
with initialized int ranges during dataflow analysis, leading to
incorrect folding in `-int-range-optimizations`.
**The Issue:**
When a `arith.select` branch has an uninitialized range (e.g., from an
op like `nvvm.read.ptx.sreg.cluster.ctaid.x`, `scf.switch`, `llvm.call`,
... that lacks range inference), the analysis computed
`IntegerValueRange::join(Uninitialized, Constant) = Constant`. This
caused the `arith.select` to be replaced with the constant, ignoring the
dynamic branch.
**Example:**
```mlir
// The bug before fix: -int-range-optimizations replaces %1 with %c32
// led to incorrect results and unsafe behaviours
%0 = nvvm.read.ptx.sreg.cluster.ctaid.x : i32 // Uninitialized int range
%c32 = arith.constant 32 : i32
%1 = arith.select %cond, %0, %c32 : i32
```
**The Fix:**
Explicitly ensure `inferResultRangesFromOptional` all select cases have
initialized ranges before combining them. If any case is uninitialized,
the result is now treated as max range. Also added default max range for
`nvvmInferResultRanges` and `test.without_bounds` op to simulate and
test uninitialized ranges.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This adds a test of the MLIR TableGen `OpBuilder` syntax with move-only
parameters types. Additionally, an overload is added to test defining a
builder outside of the TableGen interface.
Current implementation of `reifyResultShapes` forces all
implementations to return all dimensions of all results. This can be
wasteful when you only require dimensions of one result, or a single
dimension of a result. Further this also creates issues with using
patterns to resolve the `tensor.dim` and `memref.dim` operations since
the extra operations created result in the pattern rewriter entering
an infinite loop (eventually breaking out of the loop due to the
iteration limit on the pattern rewriter). This is demonstrated by some
of the test cases added here that hit this limit when using
`--resolve-shaped-type-result-dims` and
`--resolve-ranked-shaped-type-result-dims`. To resolve this issue the
interface should allow for creating just the operations needed. This
change is the first step in resolving this.
The original implementation was done with the restriction in mind that
it might not always be possible to compute dimension of a single
result or one dimension of a single result in all cases. To account
for such cases, two additional interface methods are added
- `reifyShapeOfResult` (which allows reifying dimensions of
just one result), has a default implementation that calls
`reifyResultShapes` and returns the dimensions of a single result.
- `reifyDimOfResult` (which allows reifying a single dimension of a
single result) has a default implementation that calls
`reifyDimOfResult` and returns the value for the dimension of the
result (which in turn for the default case would call
`reifyDimOfResult`).
While this change sets up the interface, ideally most operations will
implement the `refiyDimOfResult` when possible. For almost all
operations in tree this is true. Subsequent commits will change those
incrementally.
Some of the tests added here that check that the default
implementations for the above method work as expected, also end up
hitting the pattern rewriter limit when using
`--resolve-ranked-shaped-type-result-dims`/
`--resolve-ranked-shaped-type-result-dims`. For testing purposes, a
flag is added to these passes that ignore the error returned by the
pattern application (this flag is left on by default to maintain
current state).
Changes required downstream to integrate this change
1. In operation definitions in .td files, for those operations that
implement the `ReifyRankedShapedTypeOpInterface`.
```
def <op-name> : Op<..., [...,
DeclareOpInterfaceMethods[ReifyRankedShapedTypeOpInterface]]>
```
should be changed to
```
def <op-name> : Op<..., [...,
DeclareOpInterfaceMethods[ReifyRankedShapedTypeOpInterface, [
"reifyResultShapes"]]]>
```
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Use `dyn_cast` instead of `cast` and early return if op does not
implement the `DestinationStyleOpInterface`. Before the change the
following IR would cause a segfault when the transform interpreter is
run, where `myop.a` and `myop.b` implement the `TilingInterface` and not
the `DestinationStyleOpInterface`. Tried looking for ops in the upstream
dialect that implement the `TilingInterface` and not the
`DestinationStyleOpInterface` to add a test but could not find any.
```mlir
module {
func.func @fuse(%arg0: tensor<4x4x4xf32>, %arg1: tensor<4x4x4xf32>) -> tensor<4x4x4xf32> {
%mul = "myop.a"(%arg0, %arg1) : (tensor<4x4x4xf32>, tensor<4x4x4xf32>) -> tensor<4x4x4xf32>
%add = "myop.b"(%mul, %mul) : (tensor<4x4x4xf32>, tensor<4x4x4xf32>) -> tensor<4x4x4xf32>
return %add : tensor<4x4x4xf32>
}
transform.sequence failures(propagate) {
^bb0(%func: !transform.any_op):
%mul = transform.structured.match ops{["myop.a"]} in %func : (!transform.any_op) -> !transform.any_op
%add = transform.structured.match ops{["myop.b"]} in %func : (!transform.any_op) -> !transform.any_op
%loop, %tiled = transform.structured.tile_using_forall %add tile_sizes [1, 2, 4] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
%mul_fused, %mul_containing = transform.structured.fuse_into_containing_op %mul into %tiled : (!transform.any_op, !transform.any_op) -> (!transform.any_op, !transform.any_op)
}
}
```
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`
Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.
It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.
Try to reland #161575 ; I suspect a buildbot incremental build issue.
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`
Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.
It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.
Support custom types (4/N): test that it is possible to customize memref
layout specification for custom operations and function boundaries.
This is purely a test setup (no API modifications) to ensure users are
able to pass information from tensors to memrefs within bufferization
process. To achieve this, a test pass is required (since bufferization
options have to be set manually). As there is already a
--test-one-shot-module-bufferize pass present, it is extended for the
purpose.
When overriding 'getVisibility and/or 'setVisibility' the interface
methods calling them do not pick up the overriden version. Instead it is
necessary to override all the other methods as well. This adjusts these
interface methods to use the overriden version when available.
Support custom types (2/N): allow value-owning operations (e.g.
allocation ops) to bufferize custom tensors into custom buffers. This
requires BufferizableOpInterface::getBufferType() to return
BufferLikeType instead of BaseMemRefType.
Affected implementors of the interface are updated accordingly.
Relates to ee070d08163ac09842d9bf0c1315f311df39faf1.
This commit makes the following changes:
- Expose `map` and `mapOperands` in
`ValueBoundsConstraintSet::Variable`, so that the class can be used by
subclasses of `ValueBoundsConstraintSet`. Otherwise subclasses cannot
access those members.
- Add `ValueBoundsConstraintSet::strongCompare`. This method is similar
to `ValueBoundsConstraintSet::compare` except that it returns false when
the inverse comparison holds, and `llvm::failure()` if neither the
relation nor its inverse relation could be proven.
- Add `simplifyAffineMinOp`, `simplifyAffineMaxOp`, and
`simplifyAffineMinMaxOps` to simplify those operations using
`ValueBoundsConstraintSet`.
- Adds the `SimplifyMinMaxAffineOpsOp` transform op that uses
`simplifyAffineMinMaxOps`.
- Add the `test.value_with_bounds` op to test unknown values with a min
max range using `ValueBoundsOpInterface`.
- Adds tests verifying the transform.
Example:
```mlir
func.func @overlapping_constraints() -> (index, index) {
%0 = test.value_with_bounds {min = 0 : index, max = 192 : index}
%1 = test.value_with_bounds {min = 128 : index, max = 384 : index}
%2 = test.value_with_bounds {min = 256 : index, max = 512 : index}
%r0 = affine.min affine_map<()[s0, s1, s2] -> (s0, s1, s2)>()[%0, %1, %2]
%r1 = affine.max affine_map<()[s0, s1, s2] -> (s0, s1, s2)>()[%0, %1, %2]
return %r0, %r1 : index, index
}
// Result of applying `simplifyAffineMinMaxOps` to `func.func`
#map1 = affine_map<()[s0, s1] -> (s1, s0)>
func.func @overlapping_constraints() -> (index, index) {
%0 = test.value_with_bounds {max = 192 : index, min = 0 : index}
%1 = test.value_with_bounds {max = 384 : index, min = 128 : index}
%2 = test.value_with_bounds {max = 512 : index, min = 256 : index}
%3 = affine.min #map1()[%0, %1]
%4 = affine.max #map1()[%1, %2]
return %3, %4 : index, index
}
```
---------
Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
Following the addition of TensorLike and BufferLike type interfaces (see
00eaff3e9c897c263a879416d0f151d7ca7eeaff), introduce minimal changes
required to bufferize a custom tensor operation into a custom buffer
operation.
To achieve this, new interface methods are added to TensorLike type
interface that abstract away the differences between existing (tensor ->
memref) and custom conversions.
The scope of the changes is intentionally limited (for example,
BufferizableOpInterface is untouched) in order to first understand the
basics and reach consensus design-wise.
---
Notable changes:
* mlir::bufferization::getBufferType() returns BufferLikeType (instead
of BaseMemRefType)
* ToTensorOp / ToBufferOp operate on TensorLikeType / BufferLikeType.
Operation argument "memref" renamed to "buffer"
* ToTensorOp's tensor type inferring builder is dropped (users now need
to provide the tensor type explicitly)
I observed that we have the boundary comments in the codebase like:
```
//===----------------------------------------------------------------------===//
// ...
//===----------------------------------------------------------------------===//
```
I also observed that there are incomplete boundary comments. The
revision is generated by a script that completes the boundary comments.
```
//===----------------------------------------------------------------------===//
// ...
...
```
Signed-off-by: hanhanW <hanhan0912@gmail.com>
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.
This PR acts as the first part of it
* Add `OpAsmTypeInterface` and `getAsmName` API for deducing ASM name
from type
* Add default impl in `OpAsmOpInterface` to respect this API when
available.
The `OpAsmAttrInterface` / hooking into Alias system part should be
another PR, using a `getAlias` API.
### Discussion
* Instead of using `StringRef getAsmName()` as the API, I use `void
getAsmName(OpAsmSetNameFn)`, as returning StringRef might be unsafe
(std::string constructed inside then returned a _ref_; and this aligns
with the design of `getAsmResultNames`.
* On the result packing of an op, the current approach is that when not
all of the result types are `OpAsmTypeInterface`, then do nothing (old
default impl)
### Review
Cc @j2kun and @Alexanderviand-intel for downstream; Cc @River707 and
@joker-eph for relevent commit history; Cc @ftynse for discourse.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Treat integer range for vector type as union of ranges of individual
elements. With this semantics, most arith ops on vectors will work out
of the box, the only special handling needed for constants and vector
elements manipulation ops.
The end goal of these changes is to be able to optimize vectorized index
calculations.
Mem2Reg assumes SSA dependencies but did not check for graph regions.
This fixes it.
---------
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
The `elementPtrs` has changed meaning over time and the name is now
outdated which may be confusing. This PR updates it to a name
representative of current usage.
This patch adds more precise side effects to the current ops with memory
effects, allowing us to determine which OpOperand/OpResult/BlockArgument
the
operation reads or writes, rather than just recording the reading and
writing
of values. This allows for convenient use of precise side effects to
achieve
analysis and optimization.
Related discussions:
https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243
This patch adapts the `test.reflect_bounds` test Op to use explicitly
signed and unsigned representation for signed and unsigned bounds of
`IntegerType`s.
This is mostly a cosmetic change as the internal representation of the
ranges is unchanged. However, it improves readability of tests.
This PR is in preparation to some extensions to the
`InferIntRangeInterface` around the `nsw` and `nuw` flags supported in
the `arith` dialect and LLVM.
We provide some common inference logic for `index` and `arith` in
`InferIntRangeCommon.h` but our Test Ops are currently fixed to `Index`
Types. As we test the range inference for arith Ops, especially around
the overflow behaviour, it's handy to have native support for the
typical integer types in the test Ops.
This patch
1. Changes the Attributes of `test.with_bounds` ops from `Index` to
`APInt` which matches the internal representation in
`ConstantIntRanges`.
2. Allows the use of `AnyInteger` in addition to `Index` for the
operands and results of the test Ops. This now requires explicit
specification of the type in the IR, where before `Index` was implicit.
3. Requires bounds Attrs to be specified in the precision of the SSA
value, eliminating any implicit truncation or extension. (*Could this
lead to problems?*)
This commit extends the SROA interfaces to ensure the interface
instantiations can communicate newly created allocators to the
algorithm. This ensures that the SROA implementation does no longer
require re-walking the IR to find new allocators.
This commit fixes Mem2Regs mutli-slot allocator handling and extends the
test dialect to test this.
Additionally, this modifies Mem2Reg's API to always attempt a full
promotion on all the passed in "allocators". This ensures that the pass
does not require unnecessary walks over the regions and improves caching
benefits.
This commit changes `OpBuilder::tryFold` to behave more similarly to
`Operation::fold`. Concretely, this ensures that even an in-place fold
returns `success`.
This is necessary to fix a bug in the dialect conversion that occurred
when an in-place folding made an operation legal. The dialect conversion
infrastructure did not check if the result of an in-place folding
legalized the operation and just went ahead and tried to apply pattern
anyways.
The added test contains a simplified version of a breakage we observed
downstream.
This PR massively reorganizes the Test dialect's source files. It moves
manually-written op hooks into `TestOpDefs.cpp`, moves format custom
directive parsers and printers into `TestFormatUtils`, adds missing
comment blocks, and moves around where generated source files are
included for types, attributes, enums, etc. into their own source file.
This will hopefully help navigate the test dialect source code, but also
speeds up compile time of the test dialect by putting generated source
files into separate compilation units.
This also sets up the test dialect to shard its op definitions, done in
the next PR.