This PR adds basic support for defining regions in Python-defined
dialects. Example usage:
```python
class TestRegion(Dialect, name="ext_region"):
pass
class IfOp(TestRegion.Operation, name="if"):
cond: Operand[IntegerType[1]]
then: Region
else_: Region
```
Current limitations:
* We can’t specify region constraints yet (e.g., number of blocks or
block argument types). This will be addressed as a follow-up task.
* We can’t mark an op as a `Terminator` or `NoTerminator` yet. This
depends on `DynamicOpTraits` (#177735) and Python-side trait API
support, and will be implemented in a follow-up PR.
This is the first PR after splitting off #179032.
This is a follow-up PR of #169045.
---------
Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
Python bindings for the IRDL dialect were introduced in #158488. They
are currently usable—for constructing IR and dynamically loading modules
that contain `irdl.dialect` into MLIR. However, there are still several
pain points when working with them:
* The IRDL IR-building interface is not very intuitive and tends to be
quite verbose.
* We do not yet have the corresponding `OpView` classes for IRDL-defined
operations.
To address these issues, I propose creating a wrapper (effectively a
small “DSL”) on top of the existing IRDL Python bindings. This wrapper
aims to simplify IR construction and automatically generate the
corresponding `OpView` types. A simple example is shown below.
Currently, using the IRDL bindings looks like this:
```python
m = Module.create()
with InsertionPoint(m.body):
myint = irdl.dialect("myint")
with InsertionPoint(myint.body):
constant = irdl.operation_("constant")
with InsertionPoint(constant.body):
iattr = irdl.base(base_name="#builtin.integer")
i32 = irdl.is_(TypeAttr.get(IntegerType.get_signless(32)))
irdl.attributes_([iattr], ["value"])
irdl.results_([i32], ["cst"], [irdl.Variadicity.single])
add = irdl.operation_("add")
with InsertionPoint(add.body):
i32 = irdl.is_(TypeAttr.get(IntegerType.get_signless(32)))
irdl.operands_(
[i32, i32],
["lhs", "rhs"],
[irdl.Variadicity.single, irdl.Variadicity.single],
)
irdl.results_([i32], ["res"], [irdl.Variadicity.single])
irdl.load_dialects(m)
```
With the proposed DSL (module name `mlir.dialects.ext`), the equivalent
implementation becomes:
```python
class MyInt(Dialect, name="myint"):
pass
i32 = IntegerType[32]
class ConstantOp(MyInt.Operation, name="constant"):
value: IntegerAttr
cst: Result[i32]
class AddOp(MyInt.Operation, name="add"):
lhs: Operand[i32]
rhs: Operand[i32]
res: Result[i32]
MyInt.load()
```
Compared with the current IRDL Python bindings, this DSL mainly adds the
following:
* **A more intuitive interface** for constructing IRDL definitions (as
shown in the example).
* **Automatic generation of the corresponding `OpView`
classes**—including `__init__` methods and property getters for each
defined operation. Similar to TableGen’s `ins`, operands and attributes
can be interleaved in arbitrary order. Special handling is also
implemented for optional and variadic operands/results (such as
computing segment sizes) so that they feel as natural to use as native
operations.
* **Lazy insertion of ops**: all ops are created and inserted only when
`Dialect.load()` is called, which makes it unnecessary to specify an
MLIR context immediately when defining an IRDL dialect.
* **Basic type inference** in operation builders (i.e.
`OpViewCls.__init__`) for trivial result types.
The current DSL does not yet cover all IRDL operations. Several features
are not supported at the moment:
- Defining new types or attributes
- Parametric constraints
- Adding regions to operations
---------
Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
Extend linalg.pack and linalg.unpack to accept memref operands in
addition to tensors. As part of this change, we now disable all
transformations when these ops have memref semantics.
Closes https://github.com/llvm/llvm-project/issues/129004
---------
Signed-off-by: Ryutaro Okada <1015ryu88@gmail.com>
Co-authored-by: Hyunsung Lee <ita9naiwa@gmail.com>
This is a continuation of the idea from #174091 to add `match` support
for MLIR containers. In this PR the `OpAttributeMap` container is
registered as a `Mapping`, so be mapped as a "dictionary" in `match`
statements.
For this to work the `get(key, default=None)` method had to be
implemented. Those are pretty much copys of `dunderGetItemNamed` and
`dunderGetItemIndexed` with an added argument and `nb::object` as return
type, because they can now return other types than just `PyAttribute`.
Was unsure if I should refactor this to make `dunderGetItem...` use the
new `getWithDefault...` or if a separate method is preferred. Kept it as
a copy for simplicitys sake for now.
Even though the `OpAttributeMap` supports indexing by `int` and `str`,
Python does not allow to register it as a `Sequence` and a `Mapping` at
the same time. If it is registered as a Sequence it only returns the
attribute names as string, not as `NamedAttribute`. It is technically
possible to also use integer keys for the `dict`-like match, but it
doesn't provide any constraints on the number of attributes, etc., so
probably not recommended.
<details><summary>Example</summary>
```python
from mlir.ir import Context, Module, OpAttributeMap
from collections.abc import Sequence
ctx = Context()
ctx.allow_unregistered_dialects = True
module = Module.parse(
r"""
"some.op"() { some.attribute = 1 : i8,
other.attribute = 3.0,
dependent = "text" } : () -> ()
""",
ctx,
)
op = module.body.operations[0]
def test(attr):
match attr:
case [*args]:
print("matched a Sequence", args)
case _:
print("Didn't match as Sequence")
match attr:
case {"some.attribute": a, "other.attribute": b, "dependent": c}:
print("Matched as Mapping individually", a, b, c)
case _:
print("Didn't match a Mapping")
match attr:
case {0: a, 1: b}:
print("Matched as Mapping with 2 int keys", a, b)
case _:
print("Didn't match as Mapping with 2 int keys")
print("Registered as Mapping only:")
test(op.attributes)
print("\nAfter additonally registering as Sequence:")
Sequence.register(OpAttributeMap)
test(op.attributes)
```
Output:
```
Registered as Mapping only:
Didn't match as Sequence
Matched as Mapping individually 1 : i8 3.000000e+00 : f64 "text"
Matched as Mapping with 2 int keys NamedAttribute(dependent="text") NamedAttribute(other.attribute=3.000000e+00 : f64)
After additonally registering as Sequence:
matched a Sequence ['dependent', 'other.attribute', 'some.attribute']
Didn't match a Mapping
Didn't match as Mapping with 2 int keys
```
</details>
makslevental Would be great if you could take a look again ❤️
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
We've been able to do `isinstance(x, Type)` for a quite a while now
(since
bfb1ba7526)
so remove `Type.isinstance` and the the special-casing
(`_is_integer_type`, `_is_floating_point_type`, `_is_index_type`) in
some places (and therefore support various `fp8`, `fp6`, `fp4` types).
This PR ports all in-tree dialect extensions to use the
`PyConcreteType`, `PyConcreteAttribute` CRTPs instead of
`mlir_pure_subclass`. After this PR we can soft deprecate
`mlir_pure_subclass`. Also API signatures are updated to use `Py*`
instead of `Mlir*` so that type "inference" and hints are improved.
This PR continues the work of
https://github.com/llvm/llvm-project/pull/171775 by moving more useful
types/attributes into MLIRPythonSupport.
You can now do
```c++
struct PyTestIntegerRankedTensorType
: mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteType<
PyTestIntegerRankedTensorType,
mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyRankedTensorType>
struct PyTestTensorValue
: mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteValue<
PyTestTensorValue>
```
instead of `mlir_type_subclass` and `mlir_value_subclass`;
**specifically manual registration of the "value caster" via indirection
through the Python interpreter is no longer necessary** . You can also
now freely use all such types at the nanobind API level (e.g., overload
based on `FP*`):
```c++
using mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN;
standaloneM.def("print_fp_type", [](PyF16Type &) { nb::print("this is a fp16 type"); });
standaloneM.def("print_fp_type", [](PyF32Type &) { nb::print("this is a fp32 type"); });
standaloneM.def("print_fp_type", [](PyF64Type &) { nb::print("this is a fp64 type"); });
```
Note, here we only port `PythonTestModuleNanobind` but there is a
follow-up PR that ports **all** in-tree dialect extensions
https://github.com/llvm/llvm-project/pull/174156 to use these. After
that one we can soft deprecate `mlir_pure_subclass`.
Note, depends on https://github.com/llvm/llvm-project/pull/171775
# What
This PR adds a shared library `MLIRPythonSupport` which contains all of
the CRTP classes ike `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute`, as well as other useful code like `Defaulting*`
and etc enabling their reuse in downstream projects. Downstream projects
can now do
```c++
struct PyTestType : mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteType<PyTestType> {
...
};
class PyTestAttr : public mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteAttribute<PyTestAttr> {
...
}
NB_MODULE(_mlirPythonTestNanobind, m) {
PyTestType::bind(m);
PyTestAttr::bind(m);
}
```
instead of using the discordant alternative
`mlir_type_subclass`/`mlir_attr_subclass` (same goes for
`PyConcreteValue`/`mlir_value_subclass`).
# Why
This PR is mostly code motion (along with CMake) but before I describe
the changes I want to state the goals/benefits:
1. Currently upstream "core" extensions and "dialect" extensions ([all
of the `Dialect*` extensions
here](d7c734b5a1/mlir/lib/Bindings/Python))
are a two-tier system;
**a**. [core
extensions](https://github.com/llvm/llvm-project/blob/main/mlir/lib/Bindings/Python/IRTypes.cpp#L361)
enjoy first class support as far as type inference[^3], type stub
generation, and ease of implementation, while dialect extensions [have
poorer support](https://reviews.llvm.org/D150927), incorrect type stub
generation much more tedious (boilerplate) implementation;
**b**. Crucially, this two-tiered system is reflected in the fact that
**the two sets of types/attributes are not in the same Python object
hierarchy**. To wit: `isinstance(..., Type)` and `isinstance(...,
Attribute)` are not supported for the dialect extensions[^2];
**c**. Since these types are not exposed in public headers, downstream
users (dialect extensions or not) cannot write functions that overload
on e.g. `PyFloat8*Type` - that's quite a [useful
feature](fdbee98df8/cpp_ext/TorchOps.cpp (L29-L69))!
2. The dialect extensions incur a sizeable performance penalty relative
to the core extensions in that every single trip across the wire (either
`python->cpp` or `cpp->python`) requires work in addition to nanobind's
own casting/construction pipeline;
**a**. When going from `python->cpp`, [we extract the capsule object
from the Python
object](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Bindings/Python/NanobindAdaptors.h#L219C24-L219C46)
and then extract from the capsule the `Mlir*` opaque struct/ptr. This
side isn't so onerous;
**b**. When going from `cpp->python` we call long-hand call Python
`import` APIs and construct the Python object using `_CAPICreate`. Note,
there at least 2 `attr` calls incurred in addition to `_CAPICreate`;
this is already much more [efficiently handled by nanobind
itself](4ba51fcf79/src/nb_internals.h (L381-L382))!
3. This division blocks various features: in some configurations[^1] we
trigger a circular import bug because "dialect" types and attributes
perform an [import of the root `_mlir`
module](bd9651bf78/mlir/include/mlir/Bindings/Python/NanobindAdaptors.h (L585))
when they are created (the types themselves, not even instances of those
types). This blocks type stub generation for dialect extensions (i.e.,
the reason we currently only generate type stubs for `_mlir`).
# How
Prior this was not done/possible because of "ODR" issues but I have
resolved those issues; the basic idea for how we solve this is "move
things we want to share into shared libraries":
1. Move IRCore (stuff like `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute`) into `MLIRPythonSupport`;
- Note, we move the rest of the things in `IRModule.h` (renamed to
`IRCore.h`) because `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute` depend on them. This makes for a bigger PR than
one would hope for but ultimately I think we should give people access
to these classes to use as they see fit (specifically inherit from, but
also liberally use in bindings signatures instead of the opaque `Mlir*`
struct wrappers).
2. Put all of this code into a nested namespace
`MLIR_BINDINGS_PYTHON_DOMAIN` which is determined by a compile time
define (and tied to `MLIR_BINDINGS_PYTHON_NB_DOMAIN`). This is necessary
in order to prevent conflicts on both symbol name **and** typeid
(necessary for nanobind to not double register binded types) between
multiple bindings libraries (e.g., `torch-mlir`, and `jax`). Note
[nanobind doesn't support `module_local` like
pybind11](https://nanobind.readthedocs.io/en/latest/porting.html#removed-features).
It does support `NB_DOMAIN` but that is not sufficient for
disambiguating typeids across projects (to wit: we currently define
`NB_DOMAIN` and it was still necessary to move everything to a nested
namespace);
3. Build the [nanobind library itself as a shared
object](https://github.com/wjakob/nanobind/blob/master/cmake/nanobind-config.cmake#L127)
(and link it to both the extensions and `MLIRPythonSupport`).
4. CMake to make this work, in-tree, out-of-tree, downstream, upstream,
etc.
# Testing
Three tests are added here
1. `PythonTestModuleNanobind` is ported to use
`PyConcreteType<PyTestType>` instead of `mlir_type_subclass` and
`PyConcreteAttribute<PyTestAttr>` instead of `mlir_atrr_subclass`,
verifying this works for non-core extensions in-tree;
2. `StandaloneExtensionNanobind` is ported to use `struct PyCustomType :
mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteType<PyCustomType>`
instead of `mlir_type_subclass` verifying this works for non-core
extensions out-of-tree;
3. `StandaloneExtensionNanobind`'s `smoketest` is extended to also load
another bindings package (namely `mlir`) verifying
`MLIR_BINDINGS_PYTHON_DOMAIN` successfully disambiguates symbols and
typeids.
I have also tested this downstream:
https://github.com/llvm/eudsl/pull/287 as well run the following builder
bots:
mlir-nvidia-gcc7:
https://lab.llvm.org/buildbot/#/buildrequests/6654424?redirect_to_build=true
I have also tested against IREE:
https://github.com/iree-org/iree/pull/21916
# Integration
It is highly recommended to set the CMake var
`MLIR_BINDINGS_PYTHON_NB_DOMAIN` (which will also determine
`MLIR_BINDINGS_PYTHON_DOMAIN`) to something unique for each downstream.
This can also be passed explicitly to `add_mlir_python_modules` if your
project builds multiple bindings packages. I added a `WARNING` to this
effect in `AddMLIRPython.cmake`.
[^3]: Python values being typed correctly when exiting from cpp;
[^1]: Specifically when the modules are imported using `importlib`,
which occurs with nanobind's
[stubgen](https://github.com/wjakob/nanobind/blob/master/src/stubgen.py#L965);
[^2]: The workaround we implemented was a class method for the dialect
bindings called `Class.isinstance(...)`;
This allows these containers to be used in `match` statements, which
allows extracting properties and asserting a shape at the same time.
It seems to be only possible, to match as _either_ a `Mapping` _or_ a
`Sequence`, so the `OpAttributeMap` is only a `Mapping`.
I couldn't find a way to make these C++ based types properly inherit
from `Sequence` or `Mapping`, so the Mixins are not provided (nanobind
only allows C++ parent classes, modifying `__base__` complains about
differing destructors).
`OpAttributeMap` was lacking the `get` method, so I simply copied it
from `collections.abc.Mapping`.
When writing the tests i ran into the error, that I wrote
`func.FuncOp(body=[Block(...)])` instead of
`func.FuncOp(body=Region(blocks=[Block(...)]))`. So maybe also turning
`Region` itself into a Sequence would be a good addition as well? Would
extend the Scope of this PR, though.
makslevental You suggested I make the PR, so i'm tagging you here as a
potential reviewer. I hope that is ok with you. :)
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
Fixes: #164800
Ensures unsigned pooling ops in Linalg stay in the integer domain: the
lowering now rejects floating/bool inputs with a clear diagnostic, new
regression tests lock in both the error path and a valid integer
example, and transform decompositions are updated to reflect the integer
typing.
Signed-off-by: Akimasa Watanuki <mencotton0410@gmail.com>
`from ._xxx_ops_gen import _Dialect` appears in some dialect modules,
like builtin, scf, irdl.. but not all of them. This PR ensures that for
upstream dialects, `<dialect module>._Dialect` is availble, like
`arith._Dialect`.
This PR is a prerequisite for the work I’m currently doing. Later on,
I’d like to use these `_Dialect` objects via something like
`conversion_target.add_legal_dialect(arith._Dialect)` (we could of
course just use strings like `add_legal_dialect("arith")`, but compared
to using a defined symbol, I think that’s more prone to typos).
This is a follow-up of #171957 that updates the argument names of
`scf.if` Python binding to be consistent with `affine.if`. Basically,
both operations should use `has_else` to determine whether the `if`
block is presented.
cc @makslevental
Friendlier wrapper for transform.foreach.
To facilitate that friendliness, makes it so that OpResult.owner returns
the relevant OpView instead of Operation. For good measure, also changes
Value.owner to return OpView instead of Operation, thereby ensuring
consistency. That is, makes it is so that all op-returning .owner
accessors return OpView (and thereby give access to all goodies
available on registered OpViews.)
Reland of #171544 due to fixup for integration test.
Friendlier wrapper for `transform.foreach`.
To facilitate that friendliness, makes it so that `OpResult.owner`
returns the relevant `OpView` instead of `Operation`. For good measure,
also changes `Value.owner` to return `OpView` instead of `Operation`,
thereby ensuring consistency. That is, makes it is so that all
op-returning `.owner` accessors return `OpView` (and thereby give access
to all goodies available on registered `OpView`s.)
This bug was introduced by #108323, where the loc and ip were not
properly set. It may lead to errors when the operations are not linearly
asserted to the IR.
There were two bugs lurking in mlir.ir.loc_tracebacks():
1) The default None parameter was not handled correctly (passed to a
C++ function that expects ints.
2) The `yield` was incorrectly indented meaning loc_tracebacks()
could not be nested (a "generator didn't yield" exception would be
raised).
Added testing of loc_tracebacks by replacing the custom contextmanager
in the auto_location.py test with the loc_tracebacks() API.
Had to harden the test to line number differences.
---------
Co-authored-by: James Molloy <jmolloy@google.com>
Disallow implicit casting, which is surprising, and, IME, usually
indicative of copy-paste errors.
Because the initial value must be a scalar, I don't expect this to
affect any data movement.
The C++ index switch op has utilities for `getCaseBlock(int i)` and
`getDefaultBlock()`, so these have been added.
Optional body builder args have been added: one for the default case and
one for the switch cases.
Updates the derived Op-classes for the main transform ops to have all
the arguments, etc, from the auto-generated classes. Additionally
updates and adds missing snake_case wrappers for the derived classes
which shadow the snake_case wrappers of the auto-generated classes,
which were hitherto exposed alongside the derived classes.
Adds the first XeGPU transform op, `xegpu.set_desc_layout`, which attachs a `xegpu.layout` attribute to the descriptor that a `xegpu.create_nd_tdesc` op returns.
Add builders on the Python side that match builders in the C++ side, add tests for launching GPU kernels and regions, and correct some small documentation mistakes. This reflects the API decisions already made in the func dialect's Python bindings and makes use of the GPU dialect's bindings work more similar to C++ interface.
By allowing `transform.smt.constrain_params`'s region to yield SMT-vars,
op instances can declare relationships, through constraints, on incoming
params-as-SMT-vars and outgoing SMT-vars-as-params. This makes it
possible to declare that computations on params should be performed.
The semantics are that the yielded SMT-vars should be from any valid
satisfying assignment/model of the constraints in the region.
Adds initial support for Python bindings to the OpenACC dialect.
* The bindings do not provide any niceties yet, just the barebones
exposure of the dialect to Python. Construction of OpenACC ops is
therefore verbose and somewhat inconvenient, as evidenced by the test.
* The test only constructs one module, but I attempted to use enough
operations to be meaningful. It does not test all the ops exposed, but
does contain a realistic example of a memcpy idiom.
The func dialect provides a more pythonic interface for constructing
operations, but the gpu dialect does not; this is the first PR to
provide the same conveniences for the gpu dialect, starting with the
gpu.func op.
Changes to linalg `structured.fuse` transform op:
* Adds an optional `use_forall` boolean argument which generates a tiled
`scf.forall` loop instead of `scf.for` loops.
* `tile_sizes` can now be any parameter or handle.
* `tile_interchange` can now be any parameter or handle.
* IR formatting changes from `transform.structured.fuse %0 [4, 8] ...`
to `transform.structured.fuse %0 tile_sizes [4, 8] ...`
- boolean arguments are now `UnitAttrs` and should be set via the op
attr-dict: `{apply_cleanup, use_forall}`
This is a follow-up PR for #162699.
Currently, in the function where we define rewrite patterns, the `op` we
receive is of type `ir.Operation` rather than a specific `OpView` type
(such as `arith.AddIOp`). This means we can’t conveniently access
certain parts of the operation — for example, we need to use
`op.operands[0]` instead of `op.lhs`. The following example code
illustrates this situation.
```python
def to_muli(op, rewriter):
# op is typed ir.Operation instead of arith.AddIOp
pass
patterns.add(arith.AddIOp, to_muli)
```
In this PR, we convert the operation to its corresponding `OpView`
subclass before invoking the rewrite pattern callback, making it much
easier to write patterns.
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
This op enables expressing uncertainty regarding what should be
happening at particular places in transform-dialect schedules. In
particular, it enables representing a choice among alternative regions.
This choice is resolved through providing a `selected_region` argument.
When this argument is provided, the semantics are such that it is valid
to rewrite the op through substituting in the selected region -- with
the op's interpreted semantics corresponding to exactly this.
This op represents another piece of the puzzle w.r.t. a toolkit for
expressing autotuning problems with the transform dialect. Note that
this goes beyond tuning knobs _on_ transforms, going further by making
it tunable which (sequences of) transforms are to be applied.
Transform op to request a tensor value to live in a specific memory
space after bufferization
Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
Co-authored-by: Alex Zinenko <ftynse@gmail.com>