llvm-project

Author	SHA1	Message	Date
Sang Ik Lee	7fc792cba7	[MLIR] Enable GPU Dialect to SYCL runtime integration (#71430 ) GPU Dialect lowering to SYCL runtime is driven by spirv.target_env attached to gpu.module. As a result of this, spirv.target_env remains as an input to LLVMIR Translation. A SPIRVToLLVMIRTranslation without any actual translation is added to avoid an unregistered error in mlir-cpu-runner. SelectObjectAttr.cpp is updated to 1) Pass binary size argument to getModuleLoadFn 2) Pass parameter count to getKernelLaunchFn This change does not impact CUDA and ROCM usage since both mlir_cuda_runtime and mlir_rocm_runtime are already updated to accept and ignore the extra arguments.	2023-12-05 16:55:24 -05:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Fabian Mora	5093413a50	[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220 ) This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.	2023-09-14 18:00:27 -04:00
Nicolas Vasilache	7c4e8c6a27	[mlir] Disentangle dialect and extension registrations. This revision avoids the registration of dialect extensions in Pass::getDependentDialects. Such registration of extensions can be dangerous because `DialectRegistry::isSubsetOf` is always guaranteed to return false for extensions (i.e. there is no mechanism to track whether a lambda is already in the list of already registered extensions). When the context is already in a multi-threaded mode, this is guaranteed to assert. Arguably a more structured registration mechanism for extensions with a unique ExtensionID could be envisioned in the future. In the process of cleaning this up, multiple usage inconsistencies surfaced around the registration of translation extensions that this revision also cleans up. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D157703	2023-08-22 00:40:09 +00:00
Fabian Mora	b43068e870	[mlir][gpu] Update GPU translation to accept binaries. == Commit message == Modifies GPU translation to accept GPU binaries embedding them using the object manager interface method `embedBinary`, as well as accepting kernel launch operations translating them using the interface method `launchKernel`. Depends on D154152 = Explanation = Summary: These patches aim to be a replacement to the current GPU compilation infrastructure, with extensibility and trying to minimizing future disruption as the primary goal. The biggest updates performed by these patches are: - The introduction of Target attributes, these attributes handle compilation of GPU modules into binary strings. These attributes can be implemented by any dialect, leaving the option for downstream users to implement their own serializations. - The introduction of the GPU binary operation, this operation stores GPU objects for different targets and can be invoked by `gpu.launch_func`. - Making `gpu.binary` & `gpu.launch_func` translatable to LLVM IR, with the translation being controlled by Object Manager attributes. - The introduction of the `gpu-module-to-binary` pass. This pass serializes GPU modules into GPU binaries, using the GPU targets available in the module. - The introduction of the `#gpu.select_object` object manager as the default object manager, it selects a single object for embedding in the IR, by default it selects the first object. These patches leave the current infrastructure in place, allowing for a migration period for downstream users. Examples: - GPU modules using target attributes: ``` gpu.module @my_module [#gpu.nvptx<chip = "sm_90">, #gpu.amdgpu, #gpu.amdgpu<chip = "gfx90a">] { ... } ``` - Applying the `gpu-module-to-binary` pass: ``` gpu.module @my_module [#gpu.nvptx<chip = "sm_90">, #gpu.amdgpu] { ... } ; mlir-opt --gpu-module-to-binary gpu.binary @my_module [#gpu.object<#gpu.nvptx<chip = "sm_90">, "BINARY DATA">, #gpu.object<#gpu.amdgpu, "BINARY DATA">] ``` - Choosing the `#gpu.amdgpu` object for embedding: ``` gpu.binary @my_module <#gpu.select_object<#gpu.amdgpu>> [#gpu.object<#gpu.nvptx<chip = "sm_90">, "BINARY DATA">, #gpu.object<#gpu.amdgpu, "BINARY DATA">] ; It's also valid to pass the index of the object. gpu.binary @my_module <#gpu.select_object<1>> [#gpu.object<#gpu.nvptx<chip = "sm_90">, "BINARY DATA">, #gpu.object<#gpu.amdgpu, "BINARY DATA">] ``` Testing: This infrastructure was tested in 2 systems, one with a NVIDIA V100 and the other one with a AMD MI250X, in both cases the test completion was successful. Input files: - test.cpp {F28084155} - test_nvvm.mlir {F28084157} - test_rocdl.mlir {F28084162} 1. Steps for assembling the test for the NVIDIA system: ``` mlir-opt --gpu-to-llvm --gpu-module-to-binary test_nvvm.mlir \| mlir-translate --mlir-to-llvmir -o test_nvptx.ll clang++ test_nvptx.ll test.cpp -l ``` Output file: test_nvptx.ll {F28084210} 2. Steps for assembling the test for the AMD system: ``` mlir-opt --gpu-to-llvm --gpu-module-to-binary test_rocdl.mlir \| mlir-translate --mlir-to-llvmir -o test_amdgpu.ll clang++ test_amdgpu.ll test.cpp -l ``` Output file: test_amdgpu.ll {F28084217} == Diff list == The following patches implement the proposal described in: https://discourse.llvm.org/t/rfc-extending-mlir-gpu-device-codegen-pipeline/70199/54 : - D154098: Add a `GlobalSymbol` trait. - D154097: Add a parameter for passing default values to `StringRefParameter` - D154100: Adds an utility class for serializing operations to binary strings. - D154104: Add GPU target attribute interface. - D154113: Add target attribute to GPU modules. - D154117: Adds the NVPTX target attribute. - D154129: Adds the AMDGPU target attribute. - D154108: Add the GPU object manager attribute interface. - D154132: Add `gpu.binary` op and `#gpu.object` attribute. - D154137: Modifies `gpu.launch_func` to allow lowering it after gpu-to-llvm. - D154147: Add the Select Object compilation attribute. - D154149: Add the `gpu-module-to-binary` pass. - D154152: Add GPU target support to `gpu-to-llvm`. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154153	2023-08-12 00:29:42 +00:00
Fabian Mora	8ae074b195	[mlir][gpu] Add the Select Object compilation attribute. For an explanation of these patches see D154153. Commit message: This patch adds the default offloading handler for GPU binary ops: `#gpu.select_object`, it selects the object to embed based on an index or a target attribute, embedding the object as a global string and launches the kernel using the scheme used in the GPU to LLVM pass. Depends on D154137 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154147	2023-08-11 22:00:35 +00:00
Matthias Springer	61223c49dd	[mlir][GPU] Rename MLIRGPUOps CMake target to MLIRGPUDialect This is for consistency with other dialects. Differential Revision: https://reviews.llvm.org/D150659	2023-05-16 14:25:08 +02:00
Sergio Afonso	0e9523efda	[mlir] Support lowering of dialect attributes attached to top-level modules This patch supports the processing of dialect attributes attached to top-level module-type operations during MLIR-to-LLVMIR lowering. This approach modifies the `mlir::translateModuleToLLVMIR()` function to call `ModuleTranslation::convertOperation()` on the top-level operation, after its body has been lowered. This, in turn, will get the `LLVMTranslationDialectInterface` object associated to that operation's dialect before trying to use it for lowering prior to processing dialect attributes attached to the operation. Since there are no `LLVMTranslationDialectInterface`s for the builtin and GPU dialects, which define their own module-type operations, this patch also adds and registers them. The requirement for always calling `mlir::registerBuiltinDialectTranslation()` before any translation of MLIR to LLVM IR where builtin module operations are present is introduced. The purpose of these new translation interfaces is to succeed when processing module-type operations, allowing the lowering process to continue and to prevent the introduction of failures related to not finding such interfaces. Differential Revision: https://reviews.llvm.org/D145932	2023-03-21 12:54:26 +00:00

8 Commits