Update most test passes to use the transform-interpreter pass instead of
the test-transform-dialect-interpreter-pass. The new "main" interpreter
pass has a named entry point instead of looking up the top-level op with
`PossibleTopLevelOpTrait`, which is arguably a more understandable
interface. The change is mechanical, rewriting an unnamed sequence into
a named one and wrapping the transform IR in to a module when necessary.
Add an option to the transform-interpreter pass to target a tagged
payload op instead of the root anchor op, which is also useful for repro
generation.
Only the test in the transform dialect proper and the examples have not
been updated yet. These will be updated separately after a more careful
consideration of testing coverage of the transform interpreter logic.
This revision significantly simplifies the specification and implementation of mapping loops to GPU ids.
Each type of mapping (block, warpgroup, warp, thread) now comes with 2 mapping modes:
1. a 3-D "grid-like" mode, subject to alignment considerations on threadIdx.x, on which predication
may occur on a per-dimension 3-D sub-rectangle basis.
2. a n-D linearized mode, on which predication may only occur on a linear basis.
In the process, better size and alignment requirement inference are introduced along with improved runtime verification messages.
The `warp_dims` attribute was deemed confusing and is removed from the transform in favor of better size inference.
Differential Revision: https://reviews.llvm.org/D155941
Update operations in Transform dialect extensions defined in the Affine,
GPU, MemRef and Tensor dialects to use the more generic
`TransformHandleTypeInterface` type constraint instead of hardcoding
`PDL_Operation`. See
https://discourse.llvm.org/t/rfc-type-system-for-the-transform-dialect/65702
for motivation.
Remove the dependency on PDLDialect from these extensions.
Update tests to use `!transform.any_op` instead of `!pdl.operation`.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D150781
c59465e1203dd78d06e15f7ddf62141807dbd5a7 introduced mapping to warps and
linear GPU ids.
In the implementation, the delinearization basis is reversed from [x, y, z]
to [z, y x] order to properly compute the strides and allow delinearization.
Prior to this commit, we forgot to reverse it back to [x, y, z] order
before materializing the indices.
Fix this oversight.
This revisions refactors the implementation of mapping to threads to additionally allow warps and linear ids to be specified.
`warp_dims` is currently specified along with `block_dims` as a transform attribute.
Linear ids on th other hand use the flattened block_dims to predicate on the first (linearized) k threads.
An additional GPULinearIdMappingAttr is added to the GPU dialect to allow specifying loops mapped to this new scheme.
Various implementation and transform op semantics cleanups are also applied.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D146130
Change map_nested_foreach_to_threads to ignore foreach_thread not
mapping to threads, this will allow us to call
mapNestedForeachToThreadsImpl with different set of ids to lower
multiple levels. Also adds warpIds attributes.
Differential Revision: https://reviews.llvm.org/D143298
This allows much better verification messages in consuming ops that properly declare
`TransformHandleTypeInterface` on their operands.
Downstream tests can be updated with a command resembling:
```
git grep -l "structured\.match" mlir/test | xargs -i sed -i {} -e "s/\(structured.match.*\)/\1 : (\!pdl.operation) -> \!pdl.operation/g"
```
Differential Revision: https://reviews.llvm.org/D142643
The given test fails due to error below.
The following error is why the test is failing. One `memref.store` and two `memref.load` are consumers of the loop index for which I do RAUW. `memref.store` is first in the list. If I RAUW on this the loop of `llvm::make early inc range(threadIdx.getUsers())` does not return two `memref.load` as users. They remain unchanged. I'm not really certain why.
This change applies RAUW after collecting the users. If a better solution exists, I would be happy to implement it.
```
mlir-opt: ...llvm-project/mlir/include/mlir/IR/UseDefLists.h:175: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
```
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D138029
Previously, the need for a dense permutation leaked into the thread_dim_mapping specification.
This revision allows to use a sparse specification of the thread_dim_mapping and the proper completion / sorting is applied automatically.
In the process, the sematics of scf.foreach_thread is tightened to require a matching number of thread dimensions and mappings.
The relevant negative test is added.
Differential Revision: https://reviews.llvm.org/D137906
`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```
It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.
The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```
Reviewed By: ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137413
Many tests wrap the piece of the IR related to the transform dialect
into `transform.with_pdl_patterns` without actually using PDL patterns
inside. Some of these are leftovers from migration to `structured.match`
and some others are cargo cult, both are useless and pollute the tests.
Reviewed By: guraypp
Differential Revision: https://reviews.llvm.org/D135661
Use the recently introduced TransformTypeInterface instead of hardcoding
the PDLOperationType. This will allow the operations to use more
specific transform types to express pre/post-conditions in the future.
It requires the syntax and Python op construction API to be updated.
Dialect extensions will be switched separately.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D135584
This revision adds GPU transform dialect. It also introduce a prefix such as "transform.gpu" for all ops related to this dialect.
MLIR already had two GPU transform op in linalg. This revision moves these ops into GPUTransformOps. The Ops are as follows:
`transform.structured.map_nested_foreach_thread_to_gpu_blocks` -> `transform.gpu.map_foreach_to_blocks`
This op selects the outermost (toplevel) foreach_thread and parallelize across GPU blocks. It can also generate `gpu_launch`.
`transform.structured.map_nested_foreach_thread_to_gpu_threads` -> `transform.gpu.map_nested_foreach_to_threads`
This op parallelizes nested foreach_thread that are inside `gpu_launch` across GPU threads.
It doesn't add new functionality, but there are some minor refactoring of the code.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D134800