llvm-project

Author	SHA1	Message	Date
Krzysztof Drewniak	43fd4c49bd	[mlir][GPU] Improve handling of GPU bounds (#95166 ) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-06-17 23:47:38 -05:00
Pradeep Kumar	bd6568c98a	[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245 ) This commit adds support for `gpu.cluster_dim_blocks` and `gpu.cluster_block_id` Ops to represent number of blocks per cluster and block id inside a cluster respectively. Also, fixed the description of `gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use `gpu.cluster_dim_blocks` Co-authored-by: pradeepku <pradeepku@nvidia.com> Co-authored-by: Guray Ozen <guray.ozen@gmail.com>	2024-06-14 10:35:35 +05:30
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Kazu Hirata	0a81ace004	[mlir] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 01:25:58 -08:00
Krzysztof Drewniak	be575c5dfc	Re-land D139865 "Add known_block_size and known_grid_size to gpu.func" This should fix the MSVC warning that caused the previous revert. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D140766	2023-01-02 16:39:00 +00:00
Stella Stamenova	828b4762ca	Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func" This reverts commit 85e38d7cd670371206f6067772dc822049d2cbd8. This broke the windows mlir buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/30180/steps/6/logs/stdio	2022-12-23 17:29:42 -08:00
Krzysztof Drewniak	85e38d7cd6	[mlir][GPU] Add known_block_size and known_grid_size to gpu.func In many cases, the the number of workgroups (the grid size) and the number of workitems within each group (the block size) that a GPU kernel will be launched with are known. For example, if gpu.launch is called with constant block and grid sizes, we know that those are the only possible sizes that will be used to launch that kernel. In other cases, a custom code-generation pipeline that eventually produces GPU kernels may know the launch dimensions of those kernels, or at least may be able to provide an upper bound on them. Other GPU programming systems, such as OpenCL, allow capturing such information to enable compiler optimizations - see reqd_work_group_size, but MLIR currently has no mechanism for doing so. This set of attributes is the first step in enabling optimizations based on the known launch dimensions of kernels. It extends the kernel outline pass to set these bounds on kernels with constant launch dimensions and extends integer range inference for GPU index operations to account for the bounds when they are known. Subsequent revisions will use this data when lowering GPU operations to the ROCDL dialect. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D139865	2022-12-22 21:41:46 +00:00
River Riddle	10c04f4641	[mlir:GPU][NFC] Update GPU API to use prefixed accessors This doesn't flip the switch for prefix generation yet, that'll be done in a followup.	2022-09-30 15:27:10 -07:00
Mehdi Amini	28c17a4b06	Apply clang-tidy fixes for performance-unnecessary-value-param in InferIntRangeInterfaceImpls.cpp (NFC)	2022-09-01 14:50:14 +00:00
Christian Sigg	3e01af093f	[mlir] Add InferIntRangeInterface to gpu.launch Infers block/grid dimensions/indices or ranges of such dimensions/indices. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D129036	2022-07-05 07:14:54 +02:00

10 Commits