The current implementation of the WMMA intrinsic ops as they are defined
in the ROCDL tablegen is incorrect. They represent as operands what
should be attributes such as `clamp`, `opsel`, `signA/signB`. This
change performs a refactoring to bring it in line with what we expect.
---------
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
This makes it similar to `mlir::TypedValue` in the MLIR C++ API and
allows users to be more specific about the values they produce or
accept.
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
This PR exposes `linalg::inferContractionDims(ArrayRef<AffineMap>)` to
Python, allowing users to infer contraction dimensions (batch/m/n/k)
directly from a list of affine maps without needing an operation.
---------
Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
The C++ index switch op has utilities for `getCaseBlock(int i)` and
`getDefaultBlock()`, so these have been added.
Optional body builder args have been added: one for the default case and
one for the switch cases.
Updates the derived Op-classes for the main transform ops to have all
the arguments, etc, from the auto-generated classes. Additionally
updates and adds missing snake_case wrappers for the derived classes
which shadow the snake_case wrappers of the auto-generated classes,
which were hitherto exposed alongside the derived classes.
Adds the first XeGPU transform op, `xegpu.set_desc_layout`, which attachs a `xegpu.layout` attribute to the descriptor that a `xegpu.create_nd_tdesc` op returns.
Currently ExecutionEngine tries to dump all functions declared in the
module, even those which are "external" (i.e., linked/loaded at
runtime). E.g.
```mlir
func.func private @printF32(f32)
func.func @supported_arg_types(%arg0: i32, %arg1: f32) {
call @printF32(%arg1) : (f32) -> ()
return
}
```
fails with
```
Could not compile printF32:
Symbols not found: [ __mlir_printF32 ]
Program aborted due to an unhandled Error:
Symbols not found: [ __mlir_printF32 ]
```
even though `printF32` can be provided at final build time (i.e., when
the object file is linked to some executable or shlib). E.g, if our own
`libmlir_c_runner_utils` is linked.
So just skip functions which have no bodies during dump (i.e., are decls
without defns).
https://github.com/llvm/llvm-project/pull/157930 changed `nb::object
getOwner()` to `PyOpView getOwner()` which implicitly constructs the
generic OpView against from a (possibly) concrete OpView. This PR fixes
that.
Add builders on the Python side that match builders in the C++ side, add tests for launching GPU kernels and regions, and correct some small documentation mistakes. This reflects the API decisions already made in the func dialect's Python bindings and makes use of the GPU dialect's bindings work more similar to C++ interface.
By allowing `transform.smt.constrain_params`'s region to yield SMT-vars,
op instances can declare relationships, through constraints, on incoming
params-as-SMT-vars and outgoing SMT-vars-as-params. This makes it
possible to declare that computations on params should be performed.
The semantics are that the yielded SMT-vars should be from any valid
satisfying assignment/model of the constraints in the region.
This test passed locally because I had a python environment with the
`python` command available, but I should have used the `%PYTHON` lit
command substitution instead. Fixes buildbot failures from #163620.
Adds initial support for Python bindings to the OpenACC dialect.
* The bindings do not provide any niceties yet, just the barebones
exposure of the dialect to Python. Construction of OpenACC ops is
therefore verbose and somewhat inconvenient, as evidenced by the test.
* The test only constructs one module, but I attempted to use enough
operations to be meaningful. It does not test all the ops exposed, but
does contain a realistic example of a memcpy idiom.
The func dialect provides a more pythonic interface for constructing
operations, but the gpu dialect does not; this is the first PR to
provide the same conveniences for the gpu dialect, starting with the
gpu.func op.
Changes to linalg `structured.fuse` transform op:
* Adds an optional `use_forall` boolean argument which generates a tiled
`scf.forall` loop instead of `scf.for` loops.
* `tile_sizes` can now be any parameter or handle.
* `tile_interchange` can now be any parameter or handle.
* IR formatting changes from `transform.structured.fuse %0 [4, 8] ...`
to `transform.structured.fuse %0 tile_sizes [4, 8] ...`
- boolean arguments are now `UnitAttrs` and should be set via the op
attr-dict: `{apply_cleanup, use_forall}`
This is a follow-up PR for #162699.
Currently, in the function where we define rewrite patterns, the `op` we
receive is of type `ir.Operation` rather than a specific `OpView` type
(such as `arith.AddIOp`). This means we can’t conveniently access
certain parts of the operation — for example, we need to use
`op.operands[0]` instead of `op.lhs`. The following example code
illustrates this situation.
```python
def to_muli(op, rewriter):
# op is typed ir.Operation instead of arith.AddIOp
pass
patterns.add(arith.AddIOp, to_muli)
```
In this PR, we convert the operation to its corresponding `OpView`
subclass before invoking the rewrite pattern callback, making it much
easier to write patterns.
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
This PR adds support for defining custom **`RewritePattern`**
implementations directly in the Python bindings.
Previously, users could define similar patterns using the PDL dialect’s
bindings. However, for more complex patterns, this often required
writing multiple Python callbacks as PDL native constraints or rewrite
functions, which made the overall logic less intuitive—though it could
be more performant than a pure Python implementation (especially for
simple patterns).
With this change, we introduce an additional, straightforward way to
define patterns purely in Python, complementing the existing PDL-based
approach.
### Example
```python
def to_muli(op, rewriter):
with rewriter.ip:
new_op = arith.muli(op.operands[0], op.operands[1], loc=op.location)
rewriter.replace_op(op, new_op.owner)
with Context():
patterns = RewritePatternSet()
patterns.add(arith.AddIOp, to_muli) # a pattern that rewrites arith.addi to arith.muli
frozen = patterns.freeze()
module = ...
apply_patterns_and_fold_greedily(module, frozen)
```
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
`PassManager::enableStatistics` seems currently missing in both C API
and Python bindings. So here we added them in this PR, which includes
the `PassDisplayMode` enum type and the `EnableStatistics` method.
In [#160520](https://github.com/llvm/llvm-project/pull/160520), we
discussed the current limitations of PDL rewriting in Python (see [this
comment](https://github.com/llvm/llvm-project/pull/160520#issuecomment-3332326184)).
At the moment, we cannot create new operations in PDL native (python)
rewrite functions because the `PatternRewriter` APIs are not exposed.
This PR introduces bindings to retrieve the insertion point of the
`PatternRewriter`, enabling users to create new operations within Python
rewrite functions. With this capability, more complex rewrites e.g. with
branching and loops that involve op creations become possible.
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
This op enables expressing uncertainty regarding what should be
happening at particular places in transform-dialect schedules. In
particular, it enables representing a choice among alternative regions.
This choice is resolved through providing a `selected_region` argument.
When this argument is provided, the semantics are such that it is valid
to rewrite the op through substituting in the selected region -- with
the op's interpreted semantics corresponding to exactly this.
This op represents another piece of the puzzle w.r.t. a toolkit for
expressing autotuning problems with the transform dialect. Note that
this goes beyond tuning knobs _on_ transforms, going further by making
it tunable which (sequences of) transforms are to be applied.
Some of the current gettors require passing locations (i.e., there be an
active location) because they're using the "checked" APIs. This PR adds
"unchecked" gettors which only require an active context.
This is a follow-up to #159926.
That PR (#159926) exposed native rewrite function registration in PDL
through the C API and Python, enabling use with
`pdl.apply_native_rewrite`.
In this PR, we add support for native constraint functions in PDL via
`pdl.apply_native_constraint`, further completing the PDL API.
In the MLIR Python bindings, we can currently use PDL to define simple
patterns and then execute them with the greedy rewrite driver. However,
when dealing with more complex patterns—such as constant folding for
integer addition—we find that we need `apply_native_rewrite` to actually
perform arithmetic (i.e., compute the sum of two constants). For
example, consider the following PDL pseudocode:
```mlir
pdl.pattern : benefit(1) {
%a0 = pdl.attribute
%a1 = pdl.attribute
%c0 = pdl.operation "arith.constant" {value = %a0}
%c1 = pdl.operation "arith.constant" {value = %a1}
%op = pdl.operation "arith.addi"(%c0, %c1)
%sum = pdl.apply_native_rewrite "addIntegers"(%a0, %a1)
%new_cst = pdl.operation "arith.constant" {value = %sum}
pdl.replace %op with %new_cst
}
```
Here, `addIntegers` cannot be expressed in PDL alone—it requires a
*native rewrite function*. This PR introduces a mechanism to support
exactly that, allowing complex rewrite patterns to be expressed in
Python and enabling many passes to be implemented directly in Python as
well.
As a test case, we defined two new operations (`myint.constant` and
`myint.add`) in Python and implemented a constant-folding rewrite
pattern for them. The core code looks like this:
```python
m = Module.create()
with InsertionPoint(m.body):
@pdl.pattern(benefit=1, sym_name="myint_add_fold")
def pat():
...
op0 = pdl.OperationOp(name="myint.add", args=[v0, v1], types=[t])
@pdl.rewrite()
def rew():
sum = pdl.apply_native_rewrite(
[pdl.AttributeType.get()], "add_fold", [a0, a1]
)
newOp = pdl.OperationOp(
name="myint.constant", attributes={"value": sum}, types=[t]
)
pdl.ReplaceOp(op0, with_op=newOp)
def add_fold(rewriter, results, values):
a0, a1 = values
results.push_back(IntegerAttr.get(i32, a0.value + a1.value))
pdl_module = PDLModule(m)
pdl_module.register_rewrite_function("add_fold", add_fold)
```
The idea is previously discussed in Discord #mlir-python channel with
@makslevental.
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
This is a follow-up to https://github.com/llvm/llvm-project/pull/144307,
where we removed `vector.matrix_multiply` and `vector.flat_transpose`
from the Vector dialect.
This PR:
* Updates comments that were missed in the previous change.
* Renames relevant `-convert-vector-to-llvm=` options:
- `vector-contract-lowering=matmul` → `vector-contract-lowering=llvmintr`
- `vector-transpose-lowering=flat_transpose` → `vector-transpose-lowering=llvmintr`
These new names better reflect the actual transformation target - LLVM
intrinsics - rather than the now-removed abstract operations.
Introduces a Transform-dialect SMT-extension so that we can have an op
to express constrains on Transform-dialect params, in particular when
these params are knobs -- see transform.tune.knob -- and can hence be
seen as symbolic variables. This op allows expressing joint constraints
over multiple params/knobs together.
While the op's semantics are clearly defined, per SMTLIB, the interpreted
semantics -- i.e. the `apply()` method -- for now just defaults to failure. In
the future we should support attaching an implementation so that users
can Bring Your Own Solver and thereby control performance of
interpreting the op. For now the main usage is to walk schedule IR and
collect these constraints so that knobs can be rewritten to constants that
satisfy the constraints.
This a reland of https://github.com/llvm/llvm-project/pull/155741 which
was reverted at https://github.com/llvm/llvm-project/pull/157831. This
version is narrower in scope - it only turns on automatic stub
generation for `MLIRPythonExtension.Core._mlir` and **does not do
anything automatically**. Specifically, the only CMake code added to
`AddMLIRPython.cmake` is the `mlir_generate_type_stubs` function which
is then used only in a manual way. The API for
`mlir_generate_type_stubs` is:
```
Arguments:
MODULE_NAME: The fully-qualified name of the extension module (used for importing in python).
DEPENDS_TARGETS: List of targets these type stubs depend on being built; usually corresponding to the
specific extension module (e.g., something like StandalonePythonModules.extension._standaloneDialectsNanobind.dso)
and the core bindings extension module (e.g., something like StandalonePythonModules.extension._mlir.dso).
OUTPUT_DIR: The root output directory to emit the type stubs into.
OUTPUTS: List of expected outputs.
DEPENDS_TARGET_SRC_DEPS: List of cpp sources for extension library (for generating a DEPFILE).
IMPORT_PATHS: List of paths to add to PYTHONPATH for stubgen.
PATTERN_FILE: (Optional) Pattern file (see https://nanobind.readthedocs.io/en/latest/typing.html#pattern-files).
Outputs:
NB_STUBGEN_CUSTOM_TARGET: The target corresponding to generation which other targets can depend on.
```
Downstream users should use `mlir_generate_type_stubs` in coordination
with `declare_mlir_python_sources` to turn on stub generation for their
own downstream dialect extensions and upstream dialect extensions if
they so choose. Standalone example shows an example.
Note, downstream will also need to set
`-DMLIR_PYTHON_PACKAGE_PREFIX=...` correctly for their bindings.
In this PR we add basic python bindings for IRDL dialect, so that python
users can create and load IRDL dialects in python. This allows users, to
some extent, to define dialects in Python without having to modify
MLIR’s CMake/TableGen/C++ code and rebuild, making prototyping more
convenient.
A basic example is shown below (and also in the added test case):
```python
# create a module with IRDL dialects
module = Module.create()
with InsertionPoint(module.body):
dialect = irdl.DialectOp("irdl_test")
with InsertionPoint(dialect.body):
op = irdl.OperationOp("test_op")
with InsertionPoint(op.body):
f32 = irdl.is_(TypeAttr.get(F32Type.get()))
irdl.operands_([f32], ["input"], [irdl.Variadicity.single])
# load the module
irdl.load_dialects(module)
# use the op defined in IRDL
m = Module.parse("""
module {
%a = arith.constant 1.0 : f32
"irdl_test.test_op"(%a) : (f32) -> ()
}
""")
```
This PR adds support in mlir-tblgen for generating docstrings for each
Python class corresponding to an MLIR op. The docstrings are currently
derived from the op’s description in ODS, with indentation adjusted to
display nicely in Python. This makes it easier for Python users to see
the op descriptions directly in their IDE or LSP while coding.
In the future, we can extend the docstrings to include explanations for
each method, attribute, and so on.
This idea was previously discussed in the `#mlir-python` channel on
Discord with @makslevental and @superbobry.
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
Currently the type hints on the returns of the "value builders" are
`ir.Value`, `Sequence[ir.Value]`, and `ir.Operation`, none of which are
correct. The correct possibilities are `ir.OpResult`, `ir.OpResultList`,
the OpView class itself (e.g., `AttrSizedResultsOp`) or the union of the
3 (for variadic results). This PR fixes those hints.