llvm-project/mlir/lib/Bindings/Python/IRInterfaces.cpp
Maksim Levental f0ef5dba6d
[mlir][Python] create MLIRPythonSupport (#171775)
# What

This PR adds a shared library `MLIRPythonSupport` which contains all of
the CRTP classes ike `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute`, as well as other useful code like `Defaulting*`
and etc enabling their reuse in downstream projects. Downstream projects
can now do

```c++
struct PyTestType : mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteType<PyTestType> {
  ...
};

class PyTestAttr : public mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteAttribute<PyTestAttr> {
  ...
}

NB_MODULE(_mlirPythonTestNanobind, m) {
  PyTestType::bind(m);
  PyTestAttr::bind(m);
}
```

instead of using the discordant alternative
`mlir_type_subclass`/`mlir_attr_subclass` (same goes for
`PyConcreteValue`/`mlir_value_subclass`).

# Why

This PR is mostly code motion (along with CMake) but before I describe
the changes I want to state the goals/benefits:

1. Currently upstream "core" extensions and "dialect" extensions ([all
of the `Dialect*` extensions
here](d7c734b5a1/mlir/lib/Bindings/Python))
are a two-tier system;
**a**. [core
extensions](https://github.com/llvm/llvm-project/blob/main/mlir/lib/Bindings/Python/IRTypes.cpp#L361)
enjoy first class support as far as type inference[^3], type stub
generation, and ease of implementation, while dialect extensions [have
poorer support](https://reviews.llvm.org/D150927), incorrect type stub
generation much more tedious (boilerplate) implementation;
**b**. Crucially, this two-tiered system is reflected in the fact that
**the two sets of types/attributes are not in the same Python object
hierarchy**. To wit: `isinstance(..., Type)` and `isinstance(...,
Attribute)` are not supported for the dialect extensions[^2];
**c**. Since these types are not exposed in public headers, downstream
users (dialect extensions or not) cannot write functions that overload
on e.g. `PyFloat8*Type` - that's quite a [useful
feature](fdbee98df8/cpp_ext/TorchOps.cpp (L29-L69))!
2. The dialect extensions incur a sizeable performance penalty relative
to the core extensions in that every single trip across the wire (either
`python->cpp` or `cpp->python`) requires work in addition to nanobind's
own casting/construction pipeline;
**a**. When going from `python->cpp`, [we extract the capsule object
from the Python
object](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Bindings/Python/NanobindAdaptors.h#L219C24-L219C46)
and then extract from the capsule the `Mlir*` opaque struct/ptr. This
side isn't so onerous;
**b**. When going from `cpp->python` we call long-hand call Python
`import` APIs and construct the Python object using `_CAPICreate`. Note,
there at least 2 `attr` calls incurred in addition to `_CAPICreate`;
this is already much more [efficiently handled by nanobind
itself](4ba51fcf79/src/nb_internals.h (L381-L382))!
3. This division blocks various features: in some configurations[^1] we
trigger a circular import bug because "dialect" types and attributes
perform an [import of the root `_mlir`
module](bd9651bf78/mlir/include/mlir/Bindings/Python/NanobindAdaptors.h (L585))
when they are created (the types themselves, not even instances of those
types). This blocks type stub generation for dialect extensions (i.e.,
the reason we currently only generate type stubs for `_mlir`).

# How

Prior this was not done/possible because of "ODR" issues but I have
resolved those issues; the basic idea for how we solve this is "move
things we want to share into shared libraries":

1. Move IRCore (stuff like `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute`) into `MLIRPythonSupport`;
- Note, we move the rest of the things in `IRModule.h` (renamed to
`IRCore.h`) because `PyConcreteValue`, `PyConcreteType`,
`PyConcreteAttribute` depend on them. This makes for a bigger PR than
one would hope for but ultimately I think we should give people access
to these classes to use as they see fit (specifically inherit from, but
also liberally use in bindings signatures instead of the opaque `Mlir*`
struct wrappers).
2. Put all of this code into a nested namespace
`MLIR_BINDINGS_PYTHON_DOMAIN` which is determined by a compile time
define (and tied to `MLIR_BINDINGS_PYTHON_NB_DOMAIN`). This is necessary
in order to prevent conflicts on both symbol name **and** typeid
(necessary for nanobind to not double register binded types) between
multiple bindings libraries (e.g., `torch-mlir`, and `jax`). Note
[nanobind doesn't support `module_local` like
pybind11](https://nanobind.readthedocs.io/en/latest/porting.html#removed-features).
It does support `NB_DOMAIN` but that is not sufficient for
disambiguating typeids across projects (to wit: we currently define
`NB_DOMAIN` and it was still necessary to move everything to a nested
namespace);
3. Build the [nanobind library itself as a shared
object](https://github.com/wjakob/nanobind/blob/master/cmake/nanobind-config.cmake#L127)
(and link it to both the extensions and `MLIRPythonSupport`).
4. CMake to make this work, in-tree, out-of-tree, downstream, upstream,
etc.

# Testing

Three tests are added here 

1. `PythonTestModuleNanobind` is ported to use
`PyConcreteType<PyTestType>` instead of `mlir_type_subclass` and
`PyConcreteAttribute<PyTestAttr>` instead of `mlir_atrr_subclass`,
verifying this works for non-core extensions in-tree;
2. `StandaloneExtensionNanobind` is ported to use `struct PyCustomType :
mlir::python::MLIR_BINDINGS_PYTHON_DOMAIN::PyConcreteType<PyCustomType>`
instead of `mlir_type_subclass` verifying this works for non-core
extensions out-of-tree;
3. `StandaloneExtensionNanobind`'s `smoketest` is extended to also load
another bindings package (namely `mlir`) verifying
`MLIR_BINDINGS_PYTHON_DOMAIN` successfully disambiguates symbols and
typeids.

I have also tested this downstream:
https://github.com/llvm/eudsl/pull/287 as well run the following builder
bots:

mlir-nvidia-gcc7:
https://lab.llvm.org/buildbot/#/buildrequests/6654424?redirect_to_build=true

I have also tested against IREE:
https://github.com/iree-org/iree/pull/21916

# Integration

It is highly recommended to set the CMake var
`MLIR_BINDINGS_PYTHON_NB_DOMAIN` (which will also determine
`MLIR_BINDINGS_PYTHON_DOMAIN`) to something unique for each downstream.
This can also be passed explicitly to `add_mlir_python_modules` if your
project builds multiple bindings packages. I added a `WARNING` to this
effect in `AddMLIRPython.cmake`.

[^3]: Python values being typed correctly when exiting from cpp;
[^1]: Specifically when the modules are imported using `importlib`,
which occurs with nanobind's
[stubgen](https://github.com/wjakob/nanobind/blob/master/src/stubgen.py#L965);
[^2]: The workaround we implemented was a class method for the dialect
bindings called `Class.isinstance(...)`;
2026-01-05 09:08:13 -08:00

475 lines
18 KiB
C++

//===- IRInterfaces.cpp - MLIR IR interfaces pybind -----------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include <cstdint>
#include <optional>
#include <string>
#include <utility>
#include <vector>
#include "mlir-c/BuiltinAttributes.h"
#include "mlir-c/IR.h"
#include "mlir-c/Interfaces.h"
#include "mlir-c/Support.h"
#include "mlir/Bindings/Python/IRCore.h"
#include "mlir/Bindings/Python/Nanobind.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"
namespace nb = nanobind;
namespace mlir {
namespace python {
namespace MLIR_BINDINGS_PYTHON_DOMAIN {
constexpr static const char *constructorDoc =
R"(Creates an interface from a given operation/opview object or from a
subclass of OpView. Raises ValueError if the operation does not implement the
interface.)";
constexpr static const char *operationDoc =
R"(Returns an Operation for which the interface was constructed.)";
constexpr static const char *opviewDoc =
R"(Returns an OpView subclass _instance_ for which the interface was
constructed)";
constexpr static const char *inferReturnTypesDoc =
R"(Given the arguments required to build an operation, attempts to infer
its return types. Raises ValueError on failure.)";
constexpr static const char *inferReturnTypeComponentsDoc =
R"(Given the arguments required to build an operation, attempts to infer
its return shaped type components. Raises ValueError on failure.)";
namespace {
/// Takes in an optional ist of operands and converts them into a SmallVector
/// of MlirVlaues. Returns an empty SmallVector if the list is empty.
llvm::SmallVector<MlirValue> wrapOperands(std::optional<nb::list> operandList) {
llvm::SmallVector<MlirValue> mlirOperands;
if (!operandList || operandList->size() == 0) {
return mlirOperands;
}
// Note: as the list may contain other lists this may not be final size.
mlirOperands.reserve(operandList->size());
for (const auto &&it : llvm::enumerate(*operandList)) {
if (it.value().is_none())
continue;
PyValue *val;
try {
val = nb::cast<PyValue *>(it.value());
if (!val)
throw nb::cast_error();
mlirOperands.push_back(val->get());
continue;
} catch (nb::cast_error &err) {
// Intentionally unhandled to try sequence below first.
(void)err;
}
try {
auto vals = nb::cast<nb::sequence>(it.value());
for (nb::handle v : vals) {
try {
val = nb::cast<PyValue *>(v);
if (!val)
throw nb::cast_error();
mlirOperands.push_back(val->get());
} catch (nb::cast_error &err) {
throw nb::value_error(
(llvm::Twine("Operand ") + llvm::Twine(it.index()) +
" must be a Value or Sequence of Values (" + err.what() + ")")
.str()
.c_str());
}
}
continue;
} catch (nb::cast_error &err) {
throw nb::value_error((llvm::Twine("Operand ") + llvm::Twine(it.index()) +
" must be a Value or Sequence of Values (" +
err.what() + ")")
.str()
.c_str());
}
throw nb::cast_error();
}
return mlirOperands;
}
/// Takes in an optional vector of PyRegions and returns a SmallVector of
/// MlirRegion. Returns an empty SmallVector if the list is empty.
llvm::SmallVector<MlirRegion>
wrapRegions(std::optional<std::vector<PyRegion>> regions) {
llvm::SmallVector<MlirRegion> mlirRegions;
if (regions) {
mlirRegions.reserve(regions->size());
for (PyRegion &region : *regions) {
mlirRegions.push_back(region);
}
}
return mlirRegions;
}
} // namespace
/// CRTP base class for Python classes representing MLIR Op interfaces.
/// Interface hierarchies are flat so no base class is expected here. The
/// derived class is expected to define the following static fields:
/// - `const char *pyClassName` - the name of the Python class to create;
/// - `GetTypeIDFunctionTy getInterfaceID` - the function producing the TypeID
/// of the interface.
/// Derived classes may redefine the `bindDerived(ClassTy &)` method to bind
/// interface-specific methods.
///
/// An interface class may be constructed from either an Operation/OpView object
/// or from a subclass of OpView. In the latter case, only the static interface
/// methods are available, similarly to calling ConcereteOp::staticMethod on the
/// C++ side. Implementations of concrete interfaces can use the `isStatic`
/// method to check whether the interface object was constructed from a class or
/// an operation/opview instance. The `getOpName` always succeeds and returns a
/// canonical name of the operation suitable for lookups.
template <typename ConcreteIface>
class PyConcreteOpInterface {
protected:
using ClassTy = nb::class_<ConcreteIface>;
using GetTypeIDFunctionTy = MlirTypeID (*)();
public:
/// Constructs an interface instance from an object that is either an
/// operation or a subclass of OpView. In the latter case, only the static
/// methods of the interface are accessible to the caller.
PyConcreteOpInterface(nb::object object, DefaultingPyMlirContext context)
: obj(std::move(object)) {
try {
operation = &nb::cast<PyOperation &>(obj);
} catch (nb::cast_error &) {
// Do nothing.
}
try {
operation = &nb::cast<PyOpView &>(obj).getOperation();
} catch (nb::cast_error &) {
// Do nothing.
}
if (operation != nullptr) {
if (!mlirOperationImplementsInterface(*operation,
ConcreteIface::getInterfaceID())) {
std::string msg = "the operation does not implement ";
throw nb::value_error((msg + ConcreteIface::pyClassName).c_str());
}
MlirIdentifier identifier = mlirOperationGetName(*operation);
MlirStringRef stringRef = mlirIdentifierStr(identifier);
opName = std::string(stringRef.data, stringRef.length);
} else {
try {
opName = nb::cast<std::string>(obj.attr("OPERATION_NAME"));
} catch (nb::cast_error &) {
throw nb::type_error(
"Op interface does not refer to an operation or OpView class");
}
if (!mlirOperationImplementsInterfaceStatic(
mlirStringRefCreate(opName.data(), opName.length()),
context.resolve().get(), ConcreteIface::getInterfaceID())) {
std::string msg = "the operation does not implement ";
throw nb::value_error((msg + ConcreteIface::pyClassName).c_str());
}
}
}
/// Creates the Python bindings for this class in the given module.
static void bind(nb::module_ &m) {
nb::class_<ConcreteIface> cls(m, ConcreteIface::pyClassName);
cls.def(nb::init<nb::object, DefaultingPyMlirContext>(), nb::arg("object"),
nb::arg("context") = nb::none(), constructorDoc)
.def_prop_ro("operation", &PyConcreteOpInterface::getOperationObject,
operationDoc)
.def_prop_ro("opview", &PyConcreteOpInterface::getOpView, opviewDoc);
ConcreteIface::bindDerived(cls);
}
/// Hook for derived classes to add class-specific bindings.
static void bindDerived(ClassTy &cls) {}
/// Returns `true` if this object was constructed from a subclass of OpView
/// rather than from an operation instance.
bool isStatic() { return operation == nullptr; }
/// Returns the operation instance from which this object was constructed.
/// Throws a type error if this object was constructed from a subclass of
/// OpView.
nb::typed<nb::object, PyOperation> getOperationObject() {
if (operation == nullptr)
throw nb::type_error("Cannot get an operation from a static interface");
return operation->getRef().releaseObject();
}
/// Returns the opview of the operation instance from which this object was
/// constructed. Throws a type error if this object was constructed form a
/// subclass of OpView.
nb::typed<nb::object, PyOpView> getOpView() {
if (operation == nullptr)
throw nb::type_error("Cannot get an opview from a static interface");
return operation->createOpView();
}
/// Returns the canonical name of the operation this interface is constructed
/// from.
const std::string &getOpName() { return opName; }
private:
PyOperation *operation = nullptr;
std::string opName;
nb::object obj;
};
/// Python wrapper for InferTypeOpInterface. This interface has only static
/// methods.
class PyInferTypeOpInterface
: public PyConcreteOpInterface<PyInferTypeOpInterface> {
public:
using PyConcreteOpInterface<PyInferTypeOpInterface>::PyConcreteOpInterface;
constexpr static const char *pyClassName = "InferTypeOpInterface";
constexpr static GetTypeIDFunctionTy getInterfaceID =
&mlirInferTypeOpInterfaceTypeID;
/// C-style user-data structure for type appending callback.
struct AppendResultsCallbackData {
std::vector<PyType> &inferredTypes;
PyMlirContext &pyMlirContext;
};
/// Appends the types provided as the two first arguments to the user-data
/// structure (expects AppendResultsCallbackData).
static void appendResultsCallback(intptr_t nTypes, MlirType *types,
void *userData) {
auto *data = static_cast<AppendResultsCallbackData *>(userData);
data->inferredTypes.reserve(data->inferredTypes.size() + nTypes);
for (intptr_t i = 0; i < nTypes; ++i) {
data->inferredTypes.emplace_back(data->pyMlirContext.getRef(), types[i]);
}
}
/// Given the arguments required to build an operation, attempts to infer its
/// return types. Throws value_error on failure.
std::vector<PyType>
inferReturnTypes(std::optional<nb::list> operandList,
std::optional<PyAttribute> attributes, void *properties,
std::optional<std::vector<PyRegion>> regions,
DefaultingPyMlirContext context,
DefaultingPyLocation location) {
llvm::SmallVector<MlirValue> mlirOperands =
wrapOperands(std::move(operandList));
llvm::SmallVector<MlirRegion> mlirRegions = wrapRegions(std::move(regions));
std::vector<PyType> inferredTypes;
PyMlirContext &pyContext = context.resolve();
AppendResultsCallbackData data{inferredTypes, pyContext};
MlirStringRef opNameRef =
mlirStringRefCreate(getOpName().data(), getOpName().length());
MlirAttribute attributeDict =
attributes ? attributes->get() : mlirAttributeGetNull();
MlirLogicalResult result = mlirInferTypeOpInterfaceInferReturnTypes(
opNameRef, pyContext.get(), location.resolve(), mlirOperands.size(),
mlirOperands.data(), attributeDict, properties, mlirRegions.size(),
mlirRegions.data(), &appendResultsCallback, &data);
if (mlirLogicalResultIsFailure(result)) {
throw nb::value_error("Failed to infer result types");
}
return inferredTypes;
}
static void bindDerived(ClassTy &cls) {
cls.def("inferReturnTypes", &PyInferTypeOpInterface::inferReturnTypes,
nb::arg("operands") = nb::none(),
nb::arg("attributes") = nb::none(),
nb::arg("properties") = nb::none(), nb::arg("regions") = nb::none(),
nb::arg("context") = nb::none(), nb::arg("loc") = nb::none(),
inferReturnTypesDoc);
}
};
/// Wrapper around an shaped type components.
class PyShapedTypeComponents {
public:
PyShapedTypeComponents(MlirType elementType) : elementType(elementType) {}
PyShapedTypeComponents(nb::list shape, MlirType elementType)
: shape(std::move(shape)), elementType(elementType), ranked(true) {}
PyShapedTypeComponents(nb::list shape, MlirType elementType,
MlirAttribute attribute)
: shape(std::move(shape)), elementType(elementType), attribute(attribute),
ranked(true) {}
PyShapedTypeComponents(PyShapedTypeComponents &) = delete;
PyShapedTypeComponents(PyShapedTypeComponents &&other) noexcept
: shape(other.shape), elementType(other.elementType),
attribute(other.attribute), ranked(other.ranked) {}
static void bind(nb::module_ &m) {
nb::class_<PyShapedTypeComponents>(m, "ShapedTypeComponents")
.def_prop_ro(
"element_type",
[](PyShapedTypeComponents &self) { return self.elementType; },
nb::sig("def element_type(self) -> Type"),
"Returns the element type of the shaped type components.")
.def_static(
"get",
[](PyType &elementType) {
return PyShapedTypeComponents(elementType);
},
nb::arg("element_type"),
"Create an shaped type components object with only the element "
"type.")
.def_static(
"get",
[](nb::list shape, PyType &elementType) {
return PyShapedTypeComponents(std::move(shape), elementType);
},
nb::arg("shape"), nb::arg("element_type"),
"Create a ranked shaped type components object.")
.def_static(
"get",
[](nb::list shape, PyType &elementType, PyAttribute &attribute) {
return PyShapedTypeComponents(std::move(shape), elementType,
attribute);
},
nb::arg("shape"), nb::arg("element_type"), nb::arg("attribute"),
"Create a ranked shaped type components object with attribute.")
.def_prop_ro(
"has_rank",
[](PyShapedTypeComponents &self) -> bool { return self.ranked; },
"Returns whether the given shaped type component is ranked.")
.def_prop_ro(
"rank",
[](PyShapedTypeComponents &self) -> std::optional<nb::int_> {
if (!self.ranked)
return {};
return nb::int_(self.shape.size());
},
"Returns the rank of the given ranked shaped type components. If "
"the shaped type components does not have a rank, None is "
"returned.")
.def_prop_ro(
"shape",
[](PyShapedTypeComponents &self) -> std::optional<nb::list> {
if (!self.ranked)
return {};
return nb::list(self.shape);
},
"Returns the shape of the ranked shaped type components as a list "
"of integers. Returns none if the shaped type component does not "
"have a rank.");
}
nb::object getCapsule();
static PyShapedTypeComponents createFromCapsule(nb::object capsule);
private:
nb::list shape;
MlirType elementType;
MlirAttribute attribute;
bool ranked{false};
};
/// Python wrapper for InferShapedTypeOpInterface. This interface has only
/// static methods.
class PyInferShapedTypeOpInterface
: public PyConcreteOpInterface<PyInferShapedTypeOpInterface> {
public:
using PyConcreteOpInterface<
PyInferShapedTypeOpInterface>::PyConcreteOpInterface;
constexpr static const char *pyClassName = "InferShapedTypeOpInterface";
constexpr static GetTypeIDFunctionTy getInterfaceID =
&mlirInferShapedTypeOpInterfaceTypeID;
/// C-style user-data structure for type appending callback.
struct AppendResultsCallbackData {
std::vector<PyShapedTypeComponents> &inferredShapedTypeComponents;
};
/// Appends the shaped type components provided as unpacked shape, element
/// type, attribute to the user-data.
static void appendResultsCallback(bool hasRank, intptr_t rank,
const int64_t *shape, MlirType elementType,
MlirAttribute attribute, void *userData) {
auto *data = static_cast<AppendResultsCallbackData *>(userData);
if (!hasRank) {
data->inferredShapedTypeComponents.emplace_back(elementType);
} else {
nb::list shapeList;
for (intptr_t i = 0; i < rank; ++i) {
shapeList.append(shape[i]);
}
data->inferredShapedTypeComponents.emplace_back(shapeList, elementType,
attribute);
}
}
/// Given the arguments required to build an operation, attempts to infer the
/// shaped type components. Throws value_error on failure.
std::vector<PyShapedTypeComponents> inferReturnTypeComponents(
std::optional<nb::list> operandList,
std::optional<PyAttribute> attributes, void *properties,
std::optional<std::vector<PyRegion>> regions,
DefaultingPyMlirContext context, DefaultingPyLocation location) {
llvm::SmallVector<MlirValue> mlirOperands =
wrapOperands(std::move(operandList));
llvm::SmallVector<MlirRegion> mlirRegions = wrapRegions(std::move(regions));
std::vector<PyShapedTypeComponents> inferredShapedTypeComponents;
PyMlirContext &pyContext = context.resolve();
AppendResultsCallbackData data{inferredShapedTypeComponents};
MlirStringRef opNameRef =
mlirStringRefCreate(getOpName().data(), getOpName().length());
MlirAttribute attributeDict =
attributes ? attributes->get() : mlirAttributeGetNull();
MlirLogicalResult result = mlirInferShapedTypeOpInterfaceInferReturnTypes(
opNameRef, pyContext.get(), location.resolve(), mlirOperands.size(),
mlirOperands.data(), attributeDict, properties, mlirRegions.size(),
mlirRegions.data(), &appendResultsCallback, &data);
if (mlirLogicalResultIsFailure(result)) {
throw nb::value_error("Failed to infer result shape type components");
}
return inferredShapedTypeComponents;
}
static void bindDerived(ClassTy &cls) {
cls.def("inferReturnTypeComponents",
&PyInferShapedTypeOpInterface::inferReturnTypeComponents,
nb::arg("operands") = nb::none(),
nb::arg("attributes") = nb::none(), nb::arg("regions") = nb::none(),
nb::arg("properties") = nb::none(), nb::arg("context") = nb::none(),
nb::arg("loc") = nb::none(), inferReturnTypeComponentsDoc);
}
};
void populateIRInterfaces(nb::module_ &m) {
PyInferTypeOpInterface::bind(m);
PyShapedTypeComponents::bind(m);
PyInferShapedTypeOpInterface::bind(m);
}
} // namespace MLIR_BINDINGS_PYTHON_DOMAIN
} // namespace python
} // namespace mlir