This is a follow-up PR of #169045 and the second part of #179086.
In #179086, we added support for defining regions in Python-defined ops,
but its usefulness was quite limited because we still couldn’t mark an
op as a `Terminator` or `NoTerminator`. In this PR, we port the
`DynamicOpTrait` (introduced on the C++ side for `DynamicDialect` in
#177735) to Python, so we can dynamically attach traits to
Python-defined ops.
This PR adds basic support for defining regions in Python-defined
dialects. Example usage:
```python
class TestRegion(Dialect, name="ext_region"):
pass
class IfOp(TestRegion.Operation, name="if"):
cond: Operand[IntegerType[1]]
then: Region
else_: Region
```
Current limitations:
* We can’t specify region constraints yet (e.g., number of blocks or
block argument types). This will be addressed as a follow-up task.
* We can’t mark an op as a `Terminator` or `NoTerminator` yet. This
depends on `DynamicOpTraits` (#177735) and Python-side trait API
support, and will be implemented in a follow-up PR.
This is the first PR after splitting off #179032.
This is a follow-up PR of #169045.
---------
Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
The commit adds the following:
- Adds tcgen05.ld.red Op with tests under tcgen05-ld-red.mlir and
tcgen05-ld-red-invalid.mlir
- Renamed ReduxKind to ReductionKind and renamed it across NVVM and GPU
Dialects
- Replaced Tcgen05LdRedOperationAtr with ReductionKindAttr
- Updated tcgen05.ld.red and nvvm.redux.sync tests
Python bindings for the IRDL dialect were introduced in #158488. They
are currently usable—for constructing IR and dynamically loading modules
that contain `irdl.dialect` into MLIR. However, there are still several
pain points when working with them:
* The IRDL IR-building interface is not very intuitive and tends to be
quite verbose.
* We do not yet have the corresponding `OpView` classes for IRDL-defined
operations.
To address these issues, I propose creating a wrapper (effectively a
small “DSL”) on top of the existing IRDL Python bindings. This wrapper
aims to simplify IR construction and automatically generate the
corresponding `OpView` types. A simple example is shown below.
Currently, using the IRDL bindings looks like this:
```python
m = Module.create()
with InsertionPoint(m.body):
myint = irdl.dialect("myint")
with InsertionPoint(myint.body):
constant = irdl.operation_("constant")
with InsertionPoint(constant.body):
iattr = irdl.base(base_name="#builtin.integer")
i32 = irdl.is_(TypeAttr.get(IntegerType.get_signless(32)))
irdl.attributes_([iattr], ["value"])
irdl.results_([i32], ["cst"], [irdl.Variadicity.single])
add = irdl.operation_("add")
with InsertionPoint(add.body):
i32 = irdl.is_(TypeAttr.get(IntegerType.get_signless(32)))
irdl.operands_(
[i32, i32],
["lhs", "rhs"],
[irdl.Variadicity.single, irdl.Variadicity.single],
)
irdl.results_([i32], ["res"], [irdl.Variadicity.single])
irdl.load_dialects(m)
```
With the proposed DSL (module name `mlir.dialects.ext`), the equivalent
implementation becomes:
```python
class MyInt(Dialect, name="myint"):
pass
i32 = IntegerType[32]
class ConstantOp(MyInt.Operation, name="constant"):
value: IntegerAttr
cst: Result[i32]
class AddOp(MyInt.Operation, name="add"):
lhs: Operand[i32]
rhs: Operand[i32]
res: Result[i32]
MyInt.load()
```
Compared with the current IRDL Python bindings, this DSL mainly adds the
following:
* **A more intuitive interface** for constructing IRDL definitions (as
shown in the example).
* **Automatic generation of the corresponding `OpView`
classes**—including `__init__` methods and property getters for each
defined operation. Similar to TableGen’s `ins`, operands and attributes
can be interleaved in arbitrary order. Special handling is also
implemented for optional and variadic operands/results (such as
computing segment sizes) so that they feel as natural to use as native
operations.
* **Lazy insertion of ops**: all ops are created and inserted only when
`Dialect.load()` is called, which makes it unnecessary to specify an
MLIR context immediately when defining an IRDL dialect.
* **Basic type inference** in operation builders (i.e.
`OpViewCls.__init__`) for trivial result types.
The current DSL does not yet cover all IRDL operations. Several features
are not supported at the moment:
- Defining new types or attributes
- Parametric constraints
- Adding regions to operations
---------
Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
Extend linalg.pack and linalg.unpack to accept memref operands in
addition to tensors. As part of this change, we now disable all
transformations when these ops have memref semantics.
Closes https://github.com/llvm/llvm-project/issues/129004
---------
Signed-off-by: Ryutaro Okada <1015ryu88@gmail.com>
Co-authored-by: Hyunsung Lee <ita9naiwa@gmail.com>
This PR adds "downcasting" of `ir.Value` to either `BlockArgument` or
`OpResult` (and then potentially further down if a user-registered
"value caster" exists). Also this PR changes `__str__` to return the
correct thing (`OpResult(...)` or `BlockArgument(...)` instead of
generic `Value(...)`).
We've been able to do `isinstance(x, Type)` for a quite a while now
(since
bfb1ba7526)
so remove `Type.isinstance` and the the special-casing
(`_is_integer_type`, `_is_floating_point_type`, `_is_index_type`) in
some places (and therefore support various `fp8`, `fp6`, `fp4` types).
This PR ports all in-tree dialect extensions to use the
`PyConcreteType`, `PyConcreteAttribute` CRTPs instead of
`mlir_pure_subclass`. After this PR we can soft deprecate
`mlir_pure_subclass`. Also API signatures are updated to use `Py*`
instead of `Mlir*` so that type "inference" and hints are improved.
# What
This PR adds a shared library `MLIRPythonSupport` which contains all of
the CRTP classes ike `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute`, as well as other useful code like `Defaulting*`
and etc enabling their reuse in downstream projects. Downstream projects
can now do
```c++
struct PyTestType : mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteType<PyTestType> {
...
};
class PyTestAttr : public mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteAttribute<PyTestAttr> {
...
}
NB_MODULE(_mlirPythonTestNanobind, m) {
PyTestType::bind(m);
PyTestAttr::bind(m);
}
```
instead of using the discordant alternative
`mlir_type_subclass`/`mlir_attr_subclass` (same goes for
`PyConcreteValue`/`mlir_value_subclass`).
# Why
This PR is mostly code motion (along with CMake) but before I describe
the changes I want to state the goals/benefits:
1. Currently upstream "core" extensions and "dialect" extensions ([all
of the `Dialect*` extensions
here](d7c734b5a1/mlir/lib/Bindings/Python))
are a two-tier system;
**a**. [core
extensions](https://github.com/llvm/llvm-project/blob/main/mlir/lib/Bindings/Python/IRTypes.cpp#L361)
enjoy first class support as far as type inference[^3], type stub
generation, and ease of implementation, while dialect extensions [have
poorer support](https://reviews.llvm.org/D150927), incorrect type stub
generation much more tedious (boilerplate) implementation;
**b**. Crucially, this two-tiered system is reflected in the fact that
**the two sets of types/attributes are not in the same Python object
hierarchy**. To wit: `isinstance(..., Type)` and `isinstance(...,
Attribute)` are not supported for the dialect extensions[^2];
**c**. Since these types are not exposed in public headers, downstream
users (dialect extensions or not) cannot write functions that overload
on e.g. `PyFloat8*Type` - that's quite a [useful
feature](fdbee98df8/cpp_ext/TorchOps.cpp (L29-L69))!
2. The dialect extensions incur a sizeable performance penalty relative
to the core extensions in that every single trip across the wire (either
`python->cpp` or `cpp->python`) requires work in addition to nanobind's
own casting/construction pipeline;
**a**. When going from `python->cpp`, [we extract the capsule object
from the Python
object](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Bindings/Python/NanobindAdaptors.h#L219C24-L219C46)
and then extract from the capsule the `Mlir*` opaque struct/ptr. This
side isn't so onerous;
**b**. When going from `cpp->python` we call long-hand call Python
`import` APIs and construct the Python object using `_CAPICreate`. Note,
there at least 2 `attr` calls incurred in addition to `_CAPICreate`;
this is already much more [efficiently handled by nanobind
itself](4ba51fcf79/src/nb_internals.h (L381-L382))!
3. This division blocks various features: in some configurations[^1] we
trigger a circular import bug because "dialect" types and attributes
perform an [import of the root `_mlir`
module](bd9651bf78/mlir/include/mlir/Bindings/Python/NanobindAdaptors.h (L585))
when they are created (the types themselves, not even instances of those
types). This blocks type stub generation for dialect extensions (i.e.,
the reason we currently only generate type stubs for `_mlir`).
# How
Prior this was not done/possible because of "ODR" issues but I have
resolved those issues; the basic idea for how we solve this is "move
things we want to share into shared libraries":
1. Move IRCore (stuff like `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute`) into `MLIRPythonSupport`;
- Note, we move the rest of the things in `IRModule.h` (renamed to
`IRCore.h`) because `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute` depend on them. This makes for a bigger PR than
one would hope for but ultimately I think we should give people access
to these classes to use as they see fit (specifically inherit from, but
also liberally use in bindings signatures instead of the opaque `Mlir*`
struct wrappers).
2. Put all of this code into a nested namespace
`MLIR_BINDINGS_PYTHON_DOMAIN` which is determined by a compile time
define (and tied to `MLIR_BINDINGS_PYTHON_NB_DOMAIN`). This is necessary
in order to prevent conflicts on both symbol name **and** typeid
(necessary for nanobind to not double register binded types) between
multiple bindings libraries (e.g., `torch-mlir`, and `jax`). Note
[nanobind doesn't support `module_local` like
pybind11](https://nanobind.readthedocs.io/en/latest/porting.html#removed-features).
It does support `NB_DOMAIN` but that is not sufficient for
disambiguating typeids across projects (to wit: we currently define
`NB_DOMAIN` and it was still necessary to move everything to a nested
namespace);
3. Build the [nanobind library itself as a shared
object](https://github.com/wjakob/nanobind/blob/master/cmake/nanobind-config.cmake#L127)
(and link it to both the extensions and `MLIRPythonSupport`).
4. CMake to make this work, in-tree, out-of-tree, downstream, upstream,
etc.
# Testing
Three tests are added here
1. `PythonTestModuleNanobind` is ported to use
`PyConcreteType<PyTestType>` instead of `mlir_type_subclass` and
`PyConcreteAttribute<PyTestAttr>` instead of `mlir_atrr_subclass`,
verifying this works for non-core extensions in-tree;
2. `StandaloneExtensionNanobind` is ported to use `struct PyCustomType :
mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteType<PyCustomType>`
instead of `mlir_type_subclass` verifying this works for non-core
extensions out-of-tree;
3. `StandaloneExtensionNanobind`'s `smoketest` is extended to also load
another bindings package (namely `mlir`) verifying
`MLIR_BINDINGS_PYTHON_DOMAIN` successfully disambiguates symbols and
typeids.
I have also tested this downstream:
https://github.com/llvm/eudsl/pull/287 as well run the following builder
bots:
mlir-nvidia-gcc7:
https://lab.llvm.org/buildbot/#/buildrequests/6654424?redirect_to_build=true
I have also tested against IREE:
https://github.com/iree-org/iree/pull/21916
# Integration
It is highly recommended to set the CMake var
`MLIR_BINDINGS_PYTHON_NB_DOMAIN` (which will also determine
`MLIR_BINDINGS_PYTHON_DOMAIN`) to something unique for each downstream.
This can also be passed explicitly to `add_mlir_python_modules` if your
project builds multiple bindings packages. I added a `WARNING` to this
effect in `AddMLIRPython.cmake`.
[^3]: Python values being typed correctly when exiting from cpp;
[^1]: Specifically when the modules are imported using `importlib`,
which occurs with nanobind's
[stubgen](https://github.com/wjakob/nanobind/blob/master/src/stubgen.py#L965);
[^2]: The workaround we implemented was a class method for the dialect
bindings called `Class.isinstance(...)`;
Fixes: #164800
Ensures unsigned pooling ops in Linalg stay in the integer domain: the
lowering now rejects floating/bool inputs with a clear diagnostic, new
regression tests lock in both the error path and a valid integer
example, and transform decompositions are updated to reflect the integer
typing.
Signed-off-by: Akimasa Watanuki <mencotton0410@gmail.com>
This is a follow-up of #171957 that updates the argument names of
`scf.if` Python binding to be consistent with `affine.if`. Basically,
both operations should use `has_else` to determine whether the `if`
block is presented.
cc @makslevental
Friendlier wrapper for transform.foreach.
To facilitate that friendliness, makes it so that OpResult.owner returns
the relevant OpView instead of Operation. For good measure, also changes
Value.owner to return OpView instead of Operation, thereby ensuring
consistency. That is, makes it is so that all op-returning .owner
accessors return OpView (and thereby give access to all goodies
available on registered OpViews.)
Reland of #171544 due to fixup for integration test.
Friendlier wrapper for `transform.foreach`.
To facilitate that friendliness, makes it so that `OpResult.owner`
returns the relevant `OpView` instead of `Operation`. For good measure,
also changes `Value.owner` to return `OpView` instead of `Operation`,
thereby ensuring consistency. That is, makes it is so that all
op-returning `.owner` accessors return `OpView` (and thereby give access
to all goodies available on registered `OpView`s.)
This bug was introduced by #108323, where the loc and ip were not
properly set. It may lead to errors when the operations are not linearly
asserted to the IR.
Following a series of refactorings, MLIR Python bindings would crash if
a
dialect object requiring a context defined using
mlir_attribute/type_subclass
was constructed outside of the `ir.Context` context manager. The type
caster
for `MlirContext` would try using `ir.Context.current` when the default
`None`
value was provided to the `get`, which would also just return `None`.
The
caster would then attempt to obtain the MLIR capsule for that `None`,
fail,
but access it anyway without checking, leading to a C++ assertion
failure or
segfault.
Guard against this case in nanobind adaptors. Also emit a warning to the
user
to clarify expectations, as the default message confusingly says that
`None` is
accepted as context and then fails with a type error. Using Python C API
is
currently recommended by nanobind in this case since the surrounding
function
must be marked `noexcept`.
The corresponding test is in the PDL dialect since it is where I first
observed
the behavior. Core types are not using the `mlir_type_subclass`
mechanism and
are immune to the problem, so cannot be used for checking.
The current implementation of the WMMA intrinsic ops as they are defined
in the ROCDL tablegen is incorrect. They represent as operands what
should be attributes such as `clamp`, `opsel`, `signA/signB`. This
change performs a refactoring to bring it in line with what we expect.
---------
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
This makes it similar to `mlir::TypedValue` in the MLIR C++ API and
allows users to be more specific about the values they produce or
accept.
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
This PR exposes `linalg::inferContractionDims(ArrayRef<AffineMap>)` to
Python, allowing users to infer contraction dimensions (batch/m/n/k)
directly from a list of affine maps without needing an operation.
---------
Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
The C++ index switch op has utilities for `getCaseBlock(int i)` and
`getDefaultBlock()`, so these have been added.
Optional body builder args have been added: one for the default case and
one for the switch cases.
Updates the derived Op-classes for the main transform ops to have all
the arguments, etc, from the auto-generated classes. Additionally
updates and adds missing snake_case wrappers for the derived classes
which shadow the snake_case wrappers of the auto-generated classes,
which were hitherto exposed alongside the derived classes.
Adds the first XeGPU transform op, `xegpu.set_desc_layout`, which attachs a `xegpu.layout` attribute to the descriptor that a `xegpu.create_nd_tdesc` op returns.
Add builders on the Python side that match builders in the C++ side, add tests for launching GPU kernels and regions, and correct some small documentation mistakes. This reflects the API decisions already made in the func dialect's Python bindings and makes use of the GPU dialect's bindings work more similar to C++ interface.
By allowing `transform.smt.constrain_params`'s region to yield SMT-vars,
op instances can declare relationships, through constraints, on incoming
params-as-SMT-vars and outgoing SMT-vars-as-params. This makes it
possible to declare that computations on params should be performed.
The semantics are that the yielded SMT-vars should be from any valid
satisfying assignment/model of the constraints in the region.
This test passed locally because I had a python environment with the
`python` command available, but I should have used the `%PYTHON` lit
command substitution instead. Fixes buildbot failures from #163620.
Adds initial support for Python bindings to the OpenACC dialect.
* The bindings do not provide any niceties yet, just the barebones
exposure of the dialect to Python. Construction of OpenACC ops is
therefore verbose and somewhat inconvenient, as evidenced by the test.
* The test only constructs one module, but I attempted to use enough
operations to be meaningful. It does not test all the ops exposed, but
does contain a realistic example of a memcpy idiom.
The func dialect provides a more pythonic interface for constructing
operations, but the gpu dialect does not; this is the first PR to
provide the same conveniences for the gpu dialect, starting with the
gpu.func op.
Changes to linalg `structured.fuse` transform op:
* Adds an optional `use_forall` boolean argument which generates a tiled
`scf.forall` loop instead of `scf.for` loops.
* `tile_sizes` can now be any parameter or handle.
* `tile_interchange` can now be any parameter or handle.
* IR formatting changes from `transform.structured.fuse %0 [4, 8] ...`
to `transform.structured.fuse %0 tile_sizes [4, 8] ...`
- boolean arguments are now `UnitAttrs` and should be set via the op
attr-dict: `{apply_cleanup, use_forall}`