llvm-project

Author	SHA1	Message	Date
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Kazu Hirata	0a81ace004	[mlir] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 01:25:58 -08:00
Krzysztof Drewniak	be575c5dfc	Re-land D139865 "Add known_block_size and known_grid_size to gpu.func" This should fix the MSVC warning that caused the previous revert. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D140766	2023-01-02 16:39:00 +00:00
Stella Stamenova	828b4762ca	Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func" This reverts commit 85e38d7cd670371206f6067772dc822049d2cbd8. This broke the windows mlir buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/30180/steps/6/logs/stdio	2022-12-23 17:29:42 -08:00
Krzysztof Drewniak	85e38d7cd6	[mlir][GPU] Add known_block_size and known_grid_size to gpu.func In many cases, the the number of workgroups (the grid size) and the number of workitems within each group (the block size) that a GPU kernel will be launched with are known. For example, if gpu.launch is called with constant block and grid sizes, we know that those are the only possible sizes that will be used to launch that kernel. In other cases, a custom code-generation pipeline that eventually produces GPU kernels may know the launch dimensions of those kernels, or at least may be able to provide an upper bound on them. Other GPU programming systems, such as OpenCL, allow capturing such information to enable compiler optimizations - see reqd_work_group_size, but MLIR currently has no mechanism for doing so. This set of attributes is the first step in enabling optimizations based on the known launch dimensions of kernels. It extends the kernel outline pass to set these bounds on kernels with constant launch dimensions and extends integer range inference for GPU index operations to account for the bounds when they are known. Subsequent revisions will use this data when lowering GPU operations to the ROCDL dialect. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D139865	2022-12-22 21:41:46 +00:00
River Riddle	10c04f4641	[mlir:GPU][NFC] Update GPU API to use prefixed accessors This doesn't flip the switch for prefix generation yet, that'll be done in a followup.	2022-09-30 15:27:10 -07:00
Mehdi Amini	28c17a4b06	Apply clang-tidy fixes for performance-unnecessary-value-param in InferIntRangeInterfaceImpls.cpp (NFC)	2022-09-01 14:50:14 +00:00
Christian Sigg	3e01af093f	[mlir] Add InferIntRangeInterface to gpu.launch Infers block/grid dimensions/indices or ranges of such dimensions/indices. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D129036	2022-07-05 07:14:54 +02:00

8 Commits