llvm-project

Author	SHA1	Message	Date
Oleksandr "Alex" Zinenko	e4384149b5	[mlir] use transform-interpreter in test passes (#70040 ) Update most test passes to use the transform-interpreter pass instead of the test-transform-dialect-interpreter-pass. The new "main" interpreter pass has a named entry point instead of looking up the top-level op with `PossibleTopLevelOpTrait`, which is arguably a more understandable interface. The change is mechanical, rewriting an unnamed sequence into a named one and wrapping the transform IR in to a module when necessary. Add an option to the transform-interpreter pass to target a tagged payload op instead of the root anchor op, which is also useful for repro generation. Only the test in the transform dialect proper and the examples have not been updated yet. These will be updated separately after a more careful consideration of testing coverage of the transform interpreter logic.	2023-10-24 16:12:34 +02:00
Nicolas Vasilache	44e6318cea	[mlir][transforms] Revamp the implementation of mapping loops to GPUs This revision significantly simplifies the specification and implementation of mapping loops to GPU ids. Each type of mapping (block, warpgroup, warp, thread) now comes with 2 mapping modes: 1. a 3-D "grid-like" mode, subject to alignment considerations on threadIdx.x, on which predication may occur on a per-dimension 3-D sub-rectangle basis. 2. a n-D linearized mode, on which predication may only occur on a linear basis. In the process, better size and alignment requirement inference are introduced along with improved runtime verification messages. The `warp_dims` attribute was deemed confusing and is removed from the transform in favor of better size inference. Differential Revision: https://reviews.llvm.org/D155941	2023-07-26 00:09:08 +02:00
Matthias Springer	dae8c72495	[mlir][linalg] TileToForallOp: Support memref ops Support tiling of ops with memref semantics. Differential Revision: https://reviews.llvm.org/D153353	2023-06-21 09:12:34 +02:00
Alex Zinenko	2f3ac28cb2	[mlir] don't hardcode PDL_Operation in Transform dialect extensions Update operations in Transform dialect extensions defined in the Affine, GPU, MemRef and Tensor dialects to use the more generic `TransformHandleTypeInterface` type constraint instead of hardcoding `PDL_Operation`. See https://discourse.llvm.org/t/rfc-type-system-for-the-transform-dialect/65702 for motivation. Remove the dependency on PDLDialect from these extensions. Update tests to use `!transform.any_op` instead of `!pdl.operation`. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D150781	2023-05-17 15:10:12 +00:00
Nicolas Vasilache	015cd84d3c	Revert "[mlir][Linalg][Transform] Avoid FunctionalStyleTransformOpTrait where unnecesseary to improve usability" This reverts commit 31aa8ea252c0b6acdcb362c1d0f01cc4b810d6d0. This is currently not in a good state as we have some footguns due to missing listeners.	2023-03-20 07:07:27 -07:00
Nicolas Vasilache	ba7f3e1d1e	[mlir][Transform] Fix support for mapping to GPU warps and to linear ids c59465e1203dd78d06e15f7ddf62141807dbd5a7 introduced mapping to warps and linear GPU ids. In the implementation, the delinearization basis is reversed from [x, y, z] to [z, y x] order to properly compute the strides and allow delinearization. Prior to this commit, we forgot to reverse it back to [x, y, z] order before materializing the indices. Fix this oversight.	2023-03-20 05:23:17 -07:00
Nicolas Vasilache	31aa8ea252	[mlir][Linalg][Transform] Avoid FunctionalStyleTransformOpTrait where unnecesseary to improve usability Differential Revision: https://reviews.llvm.org/D146305	2023-03-20 03:17:44 -07:00
Nicolas Vasilache	c59465e120	[mlir][Transform] Add support for mapping to GPU warps and to linear ids This revisions refactors the implementation of mapping to threads to additionally allow warps and linear ids to be specified. `warp_dims` is currently specified along with `block_dims` as a transform attribute. Linear ids on th other hand use the flattened block_dims to predicate on the first (linearized) k threads. An additional GPULinearIdMappingAttr is added to the GPU dialect to allow specifying loops mapped to this new scheme. Various implementation and transform op semantics cleanups are also applied. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D146130	2023-03-20 01:05:32 -07:00
Nicolas Vasilache	aafb52d7c9	[mlir][GPUTransforms] NFC - Refactor GPUTransforms.cpp in preparation for improvements. Depends on: D145977 Differential Revision: https://reviews.llvm.org/D145980	2023-03-14 05:00:01 -07:00
Nicolas Vasilache	1cff4cbda3	[mlir][Transform] NFC - Various API cleanups and use RewriterBase in lieu of PatternRewriter Depends on: D145685 Differential Revision: https://reviews.llvm.org/D145977	2023-03-14 04:23:12 -07:00
Alexander Belyaev	eb2f946e78	[mlir][scf] Rename ForeachThreadOp->ForallOp, PerformConcurrentlyOp->InParallelOp. Differential Revision: https://reviews.llvm.org/D144242	2023-02-17 09:59:39 +01:00
Thomas Raoux	a7686db801	[mlir][gpu] Allow distributing to different level of IDs without failing Change map_nested_foreach_to_threads to ignore foreach_thread not mapping to threads, this will allow us to call mapNestedForeachToThreadsImpl with different set of ids to lower multiple levels. Also adds warpIds attributes. Differential Revision: https://reviews.llvm.org/D143298	2023-02-04 02:03:05 +00:00
Nicolas Vasilache	e1c5cbc09c	[mlir][Linalg] Put a proper type on transform.structured.match op This allows much better verification messages in consuming ops that properly declare `TransformHandleTypeInterface` on their operands. Downstream tests can be updated with a command resembling: ``` git grep -l "structured\.match" mlir/test \| xargs -i sed -i {} -e "s/\(structured.match.*\)/\1 : (\!pdl.operation) -> \!pdl.operation/g" ``` Differential Revision: https://reviews.llvm.org/D142643	2023-01-26 08:51:34 -08:00
Thomas Raoux	794979ad8c	[mlir][gpu] Improve foreach_thread distribution Replace Ids with 0 when block dim is 1 when distributing foreach_thread. Differential Revision: https://reviews.llvm.org/D141718	2023-01-17 17:12:55 +00:00
Guray Ozen	63ca939783	[mlir] [transform] Fix for RAUW error in transform gpu dialect The given test fails due to error below. The following error is why the test is failing. One `memref.store` and two `memref.load` are consumers of the loop index for which I do RAUW. `memref.store` is first in the list. If I RAUW on this the loop of `llvm::make early inc range(threadIdx.getUsers())` does not return two `memref.load` as users. They remain unchanged. I'm not really certain why. This change applies RAUW after collecting the users. If a better solution exists, I would be happy to implement it. ``` mlir-opt: ...llvm-project/mlir/include/mlir/IR/UseDefLists.h:175: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. ``` Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D138029	2022-11-16 09:55:24 +01:00
Nicolas Vasilache	f0a411da77	[mlir][Transform]Significantly cleanup scf.foreach_thread and GPU transform permutation handling Previously, the need for a dense permutation leaked into the thread_dim_mapping specification. This revision allows to use a sparse specification of the thread_dim_mapping and the proper completion / sorting is applied automatically. In the process, the sematics of scf.foreach_thread is tightened to require a matching number of thread dimensions and mappings. The relevant negative test is added. Differential Revision: https://reviews.llvm.org/D137906	2022-11-14 09:19:49 -08:00
Guray Ozen	6663f34704	[mlir] Introduce device mapper attribute for `thread_dim_map` and `mapped to dims` `scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it. ``` scf.foreach_thread (%i, %j) in (%c1, %c2) { scf.foreach_thread (%i2, %j2) in (%c1, %c2) {...} { thread_dim_mapping = [0, 1]} } { thread_dim_mapping = [0, 1]} ``` It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation. The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads. ``` scf.foreach_thread (%i, %j) in (%c1, %c2) { scf.foreach_thread (%i2, %j2) in (%c1, %c2) {...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]} } { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]} ``` Reviewed By: ftynse, nicolasvasilache Differential Revision: https://reviews.llvm.org/D137413	2022-11-11 08:44:57 +01:00
Alex Zinenko	2e9abc0c71	[mlir] drop unnecssary transform.with_pdl_patterns from tests, NFC Many tests wrap the piece of the IR related to the transform dialect into `transform.with_pdl_patterns` without actually using PDL patterns inside. Some of these are leftovers from migration to `structured.match` and some others are cargo cult, both are useless and pollute the tests. Reviewed By: guraypp Differential Revision: https://reviews.llvm.org/D135661	2022-10-11 12:26:11 +00:00
Alex Zinenko	6fe0309602	[mlir] switch transform dialect ops to use TransformTypeInterface Use the recently introduced TransformTypeInterface instead of hardcoding the PDLOperationType. This will allow the operations to use more specific transform types to express pre/post-conditions in the future. It requires the syntax and Python op construction API to be updated. Dialect extensions will be switched separately. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D135584	2022-10-11 09:55:13 +00:00
Guray Ozen	89bb0cae46	[mlir][transform] Create GPU transform dialect This revision adds GPU transform dialect. It also introduce a prefix such as "transform.gpu" for all ops related to this dialect. MLIR already had two GPU transform op in linalg. This revision moves these ops into GPUTransformOps. The Ops are as follows: `transform.structured.map_nested_foreach_thread_to_gpu_blocks` -> `transform.gpu.map_foreach_to_blocks` This op selects the outermost (toplevel) foreach_thread and parallelize across GPU blocks. It can also generate `gpu_launch`. `transform.structured.map_nested_foreach_thread_to_gpu_threads` -> `transform.gpu.map_nested_foreach_to_threads` This op parallelizes nested foreach_thread that are inside `gpu_launch` across GPU threads. It doesn't add new functionality, but there are some minor refactoring of the code. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D134800	2022-10-04 13:09:08 +02:00

20 Commits