llvm-project

Author	SHA1	Message	Date
Aart Bik	8998bcfbce	[mlir][sparse][gpu] refine type of workspace size variables (#66438 ) Rationale: Some compiler settings don't like the size_t vs uint64_t setup.	2023-09-14 15:49:52 -07:00
Fabian Mora	5093413a50	[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220 ) This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.	2023-09-14 18:00:27 -04:00
Aart Bik	3635c74375	[mlir][gpu][sparse] gracefully accept zero size allocation (#66127 ) This cleans up a unnecessary code that changes zero size allocation to avoid the following error message 'cuMemAlloc(&ptr, sizeBytes)' failed with 'CUDA_ERROR_INVALID_VALUE'	2023-09-12 13:07:24 -07:00
Guray Ozen	1dc0071216	[MLIR] Guard Cuda 12.0+ newer driver APIs with CUDA_VERSION macro checks Fixes #64529 https://github.com/llvm/llvm-project/issues/64529 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D159440	2023-09-06 08:17:06 +02:00
Aart Bik	289f7231f9	[mlir][sparse][gpu] minor code cleanup for sparse gpu ops Consistent order of ops and related methods. Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp since this is a general utility for sparse matrices, not specific to GEMM ops only. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D157922	2023-08-14 15:08:57 -07:00
Aart Bik	95a6c509c9	[mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugs Rationale: Since we only support default algorithm for SpGEMM, we can remove the estimate op (for now at least). This also introduces the set csr pointers op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown. This revision paves the way for actual recognition of SpGEMM in the sparsifier. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157645	2023-08-10 13:52:47 -07:00
Aart Bik	e7e4ed0d7a	[mlir][sparse][gpu] only support default algorithm for SpGEMM Rationale: This is the approach taken for all the others too (SpMV, SpMM, SDDMM), so it is more consistent to follow the same path (until we have a need for more algorithms). Also, in a follow up revision, this will allow us to remove some unused GEMM ops. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157542	2023-08-09 12:49:47 -07:00
Kun Wu	0664db5425	[mlir][sparse][gpu] fix spgemm runtime compile error Differential Revision: https://reviews.llvm.org/D157349	2023-08-08 01:37:31 +00:00
Kun Wu	dfe2942909	[mlir][sparse][gpu] add spgemm operator Differential Revision: https://reviews.llvm.org/D152981	2023-08-08 00:29:23 +00:00
Guray Ozen	53881490c2	[mlir][cuda runtime] Set Max Dynamic Shared Memory Attribute This works aims to address the issue related to larger shared memory usage in the MLIR CUDA runtime. Currently, when the shared memory usage exceeds 48KB, we need to set the CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES attribute of the CUDA kernel appropriately. This work takes care of that by setting the attribute as required. Additionally, it includes some debug prints for better visibility and troubleshooting. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D156874	2023-08-02 14:18:59 +02:00
Guray Ozen	19b1107963	[mlir][gpu] Add debug print with environment value This work introduces `MLIR_CUDA_DEBUG` environment value and `debug_print` function to be able to debug runtimes. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D156232	2023-08-02 11:55:32 +02:00
Kun Wu	1e491c425b	[mlir][sparse][gpu] add 2:4 spmm prune_and_check flag Differential Revision: https://reviews.llvm.org/D155909	2023-08-01 18:24:18 +00:00
Guray Ozen	e56d6745f7	[mlir][nvgpu] Add `tma.create.descriptor` to create tensor map descriptor The Op creates a tensor map descriptor object representing tiled memory region. The descriptor is used by Tensor Memory Access (TMA). The `tensor` is the source tensor to be tiled. The `boxDimensions` is the size of the tiled memory region in each dimension. The pattern here lowers `tma.create.descriptor` to a runtime function call that eventually calls calls CUDA Driver's `cuTensorMapEncodeTiled`. For more information see below: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html Depends on D155453 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155680	2023-07-21 11:33:04 +02:00
Aart Bik	4df01dc270	[mlir][sparse][gpu][nvidia] add pruning step and check to 2:4 matrix multiplication (1) without the check, the results may silently be wrong, so check is needed (2) add pruning step to guarantee 2:4 property Note, in the longer run, we may want to split out the pruning step somehow, or make it optional. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D155320	2023-07-14 12:08:13 -07:00
Aart Bik	97678cec1b	[mlir][sparse][gpu] remove zero init memset avoids quite a big memory fill for each setup Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D155251	2023-07-13 18:22:21 -07:00
Aart Bik	86eff489e7	[mlir][sparse][gpu] force 16-byte alignment on data structs for cuSparseLt Also makes some minor consistency edits in the cuSparseLt wrapper lib. Reviewed By: Peiming, K-Wu Differential Revision: https://reviews.llvm.org/D155139	2023-07-13 10:45:15 -07:00
Adrian Kuegel	f250fbcbbb	[mlir] Apply ClangTidy fix (NFC) The return statement is redundant.	2023-07-10 11:46:32 +02:00
Aart Bik	03125e6894	[mlir][sparse][gpu] fix missing dealloc This dealloc was incorrectly removed in https://reviews.llvm.org/D153173 Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D154564	2023-07-06 09:48:19 -07:00
Kun Wu	be2dd22b8f	[mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime Differential Revision: https://reviews.llvm.org/D153173	2023-06-30 21:52:34 +00:00
Kun Wu	7a3ebba9cb	[mlir][sparse][gpu] Add explaining string to three static_assert stmts Differential Revision: https://reviews.llvm.org/D154243	2023-06-30 14:10:45 -05:00
Kun Wu	632ccc538c	[mlir][sparse][gpu] remove tuple as one of the spmm_buffer_size output type Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D153188	2023-06-19 15:57:50 +00:00
Kun Wu	9167dd46ba	[mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D151582	2023-06-15 23:48:11 +00:00
Kun Wu	ac30f48e37	[mlir][sparse][gpu]fix various cusparseLt bugs Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D152489	2023-06-12 23:48:49 +00:00
Navdeep Katel	18cc07aa07	[MLIR][GPU] Add 16-bit version of cudaMemset in cudaRuntimeWrappers Add 16-bit version of cudaMemset in cudaRuntimeWrappers and update the GPU to LLVM lowering. Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D151642	2023-06-08 17:33:26 +05:30
Aart Bik	50db4789a8	[mlir][sparse][gpu] refined build setup for cusparse Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D152387	2023-06-07 11:09:22 -07:00
Kun Wu	8ed59c53de	[mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D151775	2023-06-06 23:13:21 +00:00
Aart Bik	9fc02a7a08	[mlir][sparse][gpu] add AoS COO support to cuSPARSE Even though this feature was deprecated in release 11.2, any library before this version still supports the feature, which is why we are making it available under a macro. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D152290	2023-06-06 12:32:46 -07:00
Kun Wu	7e44f0736a	[mlir][gpu][sparse] fix broken type in cusparseCreateCsr Differential Revision: https://reviews.llvm.org/D151912	2023-06-01 18:06:09 +00:00
Kun Wu	be6c532005	[mlir][sparse][gpu] fixing broken literal names in cuda runner macros Differential Revision: https://reviews.llvm.org/D151910	2023-06-01 17:52:58 +00:00
Kun Wu	cc402de0b1	[mlir][sparse][gpu] add result type to spmv and spmm gpu libgen path Differential Revision: https://reviews.llvm.org/D151592	2023-06-01 17:17:40 +00:00
Aart Bik	752c04777f	[mlir][sparse][gpu] fix merge conflict Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D151619	2023-05-27 13:42:20 -07:00
Kun Wu	cf44847b4d	[mlir][gpu][sparse] adding cusparse sddmm support Differential Revision: https://reviews.llvm.org/D151279	2023-05-27 20:01:41 +00:00
Aart Bik	74e29d3715	[mlir][sparse][gpu] fix merge conflict Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D151574	2023-05-26 11:00:20 -07:00
Kun Wu	235fbe792b	[mlir] [sparse] [gpu] adding transpose support to spmm spmv Reviewed By: aartbik, wrengr Differential Revision: https://reviews.llvm.org/D151259	2023-05-26 17:07:09 +00:00
Aart Bik	bcb698bfdc	[mlir][sparse][gpu] various cuSparse refinements (1) keep all cuSparse ops on single stream without wait() in right order (2) use more type precise memref types for COO (3) use ToTensor on resulting memref (even though it folds away again) Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D151404	2023-05-24 22:32:52 -07:00
Aart Bik	4ebd836d9e	[mlir][sparse][gpu] fix F32 bug for SpMV and SpMM The alpha/beta variables, residing on the host, should have the 32-bit or 64-bit width of the result type. It was formerly always passed as double. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D151255	2023-05-23 17:36:03 -07:00
Aart Bik	a8e1f80f8b	[mlir][sparse][gpu] derive type of cuSparse op This no longer assumes just F64 output. Note, however, that it will be cleaner to carry the data type in the corresponding operation (rather than tracking operands). That will also allow for mixed type cases, where operands and result type are different This will be done in a follow revision where the result type is carried by the SpMV/SpMM op itself (and friends). Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D151005	2023-05-19 17:07:52 -07:00
Aart Bik	981cf1678d	[mlir][sparse][gpu] add SpMM to GPU ops dialect Reviewed By: ThomasRaoux, K-Wu Differential Revision: https://reviews.llvm.org/D150618	2023-05-19 12:46:11 -07:00
Aart Bik	b700a90cc0	[mlir][gpu][sparse] add gpu ops for sparse matrix computations This revision extends the GPU dialect with ops that can be lowered to host-oriented sparse matrix library calls (in this case cuSparse focused although the ops could be generalized to support more GPUs in principle). This will allow the "sparse compiler pipeline" to accelerate sparse operations (see follow up revisions with examples of this). For some background; https://discourse.llvm.org/t/sparse-compiler-and-gpu-code-generation/69786/2 Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D150152	2023-05-12 10:44:36 -07:00
max	8f7c8a6ea7	Add gpu::HostUnregisterOp Without explicitly unregistering you will get ``` 'cuMemHostRegister(ptr, sizeBytes, 0)' failed with 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED' ``` in CUDA (for example) after repeated runs (e.g., during benchmarking the same kernel). Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D147277	2023-04-06 15:07:12 -05:00
Mehdi Amini	6b7e6ea489	Revert "Fix CUDA runtime wrapper for GPU mem alloc/free to async" This reverts commit b4117fede20b8c649320ad37364ae208baa0d0e7. This broke one of the MLIR bot, a test is failing.	2022-04-12 06:50:27 +00:00
Uday Bondhugula	b4117fede2	Fix CUDA runtime wrapper for GPU mem alloc/free to async Switch CUDA runtime wrapper for GPU mem alloc/free to async. The semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it lowered to (gpu-to-llvm) was for the async versions -- however, this was being incorrectly mapped to cuMemAlloc/cuMemFree instead of cuMemAllocAsync/cuMemFreeAsync. Reviewed By: csigg Differential Revision: https://reviews.llvm.org/D123482	2022-04-12 09:04:02 +05:30
Krzysztof Drewniak	c5803ee4fa	[MLIR][GPU] Remove call to cudaSetDevice(), which no longer exists Differential Revision: https://reviews.llvm.org/D120085	2022-02-17 21:38:05 +00:00
Krzysztof Drewniak	84718d37db	[MLIR][GPU] Add gpu.set_default_device op This op is added to allow MLIR code running on multi-GPU systems to select the GPU they want to execute operations on when no GPU is otherwise specified. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D119883	2022-02-17 21:30:09 +00:00
Nicolas Vasilache	012c0cc7c3	[mlir] NFC - Avoid unused symbol in opt mode.	2021-10-14 11:26:33 +00:00
Loren Maggiore	361458b1ce	[mlir] create gpu memset op Create a gpu memset op and corresponding CUDA and ROCm wrappers. Reviewed By: herhut, lorenrose1013 Differential Revision: https://reviews.llvm.org/D107548	2021-09-04 08:13:04 +02:00
Aart Bik	b9f87e24f2	[mlir] add missing include, fix broken build Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D108873	2021-08-28 09:36:38 -07:00
Uday Bondhugula	4edc9e2acf	[MLIR][GPU] Drop mgpuMemHostRegisterMemRef's dependence on LLVM Support Drop mgpuMemHostRegisterMemRef's dependence on LLVM Support. This method is the only one in CUDA runtime wrappers library that creates a dependence on libLLVMSupport due to its use of SmallVector and ArrayRef. The code can be as easily/compactly written without those ADT. The dependence on LLVMSupport adds a significant amount of additional complexity for external things that want to link this library in (both statically or as a shared object) since libLLVMSupport includes numerous other objects that are sensitive to C++ compiler version and ABI. Differential Revision: https://reviews.llvm.org/D108684	2021-08-28 11:37:55 +05:30
Christian Sigg	f69d5a7fc7	[mlir] Initialize CUDA context lazily. So we can remove the ignore-warning pragma again. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D97864	2021-03-04 13:07:56 +01:00
Christian Sigg	b6ac26fce5	[mlir] Silence -Wglobal-constructors error in CudaRuntimeWrapper.cpp Until I have a better solution with dynamic initialization, to get the nvidia build bot green again.	2021-03-03 13:48:03 +01:00

1 2

51 Commits