51 Commits

Author SHA1 Message Date
Aart Bik
8998bcfbce
[mlir][sparse][gpu] refine type of workspace size variables (#66438)
Rationale:
Some compiler settings don't like the size_t vs uint64_t setup.
2023-09-14 15:49:52 -07:00
Fabian Mora
5093413a50
[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220)
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.

2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.

3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.

4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
2023-09-14 18:00:27 -04:00
Aart Bik
3635c74375
[mlir][gpu][sparse] gracefully accept zero size allocation (#66127)
This cleans up a unnecessary code that changes zero size allocation to
avoid the following error message

'cuMemAlloc(&ptr, sizeBytes)' failed with 'CUDA_ERROR_INVALID_VALUE'
2023-09-12 13:07:24 -07:00
Guray Ozen
1dc0071216 [MLIR] Guard Cuda 12.0+ newer driver APIs with CUDA_VERSION macro checks
Fixes #64529
https://github.com/llvm/llvm-project/issues/64529

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D159440
2023-09-06 08:17:06 +02:00
Aart Bik
289f7231f9 [mlir][sparse][gpu] minor code cleanup for sparse gpu ops
Consistent order of ops and related methods.
Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp
since this is a general utility for sparse matrices,
not specific to GEMM ops only.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D157922
2023-08-14 15:08:57 -07:00
Aart Bik
95a6c509c9 [mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugs
Rationale:
Since we only support default algorithm for SpGEMM, we can remove the
estimate op (for now at least). This also introduces the set csr pointers
op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown.
This revision paves the way for actual recognition of SpGEMM in the sparsifier.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D157645
2023-08-10 13:52:47 -07:00
Aart Bik
e7e4ed0d7a [mlir][sparse][gpu] only support default algorithm for SpGEMM
Rationale:
This is the approach taken for all the others too (SpMV, SpMM, SDDMM),
so it is more consistent to follow the same path (until we have a need
for more algorithms). Also, in a follow up revision, this will allow
us to remove some unused GEMM ops.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D157542
2023-08-09 12:49:47 -07:00
Kun Wu
0664db5425 [mlir][sparse][gpu] fix spgemm runtime compile error
Differential Revision: https://reviews.llvm.org/D157349
2023-08-08 01:37:31 +00:00
Kun Wu
dfe2942909 [mlir][sparse][gpu] add spgemm operator
Differential Revision: https://reviews.llvm.org/D152981
2023-08-08 00:29:23 +00:00
Guray Ozen
53881490c2 [mlir][cuda runtime] Set Max Dynamic Shared Memory Attribute
This works aims to address the issue related to larger shared memory usage in the MLIR CUDA runtime. Currently, when the shared memory usage exceeds 48KB, we need to set the CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES attribute of the CUDA kernel appropriately. This work takes care of that by setting the attribute as required. Additionally, it includes some debug prints for better visibility and troubleshooting.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D156874
2023-08-02 14:18:59 +02:00
Guray Ozen
19b1107963 [mlir][gpu] Add debug print with environment value
This work introduces `MLIR_CUDA_DEBUG` environment value and `debug_print` function to be able to debug runtimes.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D156232
2023-08-02 11:55:32 +02:00
Kun Wu
1e491c425b [mlir][sparse][gpu] add 2:4 spmm prune_and_check flag
Differential Revision: https://reviews.llvm.org/D155909
2023-08-01 18:24:18 +00:00
Guray Ozen
e56d6745f7 [mlir][nvgpu] Add tma.create.descriptor to create tensor map descriptor
The Op creates a tensor map descriptor object representing tiled memory region. The descriptor is used by Tensor Memory Access (TMA). The `tensor` is the source tensor to be tiled. The `boxDimensions` is the size of the tiled memory region in each dimension.

The pattern here lowers `tma.create.descriptor` to a runtime function call that eventually calls calls CUDA Driver's `cuTensorMapEncodeTiled`. For more information see below:
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html

Depends on D155453

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155680
2023-07-21 11:33:04 +02:00
Aart Bik
4df01dc270 [mlir][sparse][gpu][nvidia] add pruning step and check to 2:4 matrix multiplication
(1) without the check, the results may silently be wrong, so check is needed
(2) add pruning step to guarantee 2:4 property

Note, in the longer run, we may want to split out the pruning step somehow,
or make it optional.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D155320
2023-07-14 12:08:13 -07:00
Aart Bik
97678cec1b [mlir][sparse][gpu] remove zero init memset
avoids quite a big memory fill for each setup

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D155251
2023-07-13 18:22:21 -07:00
Aart Bik
86eff489e7 [mlir][sparse][gpu] force 16-byte alignment on data structs for cuSparseLt
Also makes some minor consistency edits in the cuSparseLt wrapper lib.

Reviewed By: Peiming, K-Wu

Differential Revision: https://reviews.llvm.org/D155139
2023-07-13 10:45:15 -07:00
Adrian Kuegel
f250fbcbbb [mlir] Apply ClangTidy fix (NFC)
The return statement is redundant.
2023-07-10 11:46:32 +02:00
Aart Bik
03125e6894 [mlir][sparse][gpu] fix missing dealloc
This dealloc was incorrectly removed in
https://reviews.llvm.org/D153173

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D154564
2023-07-06 09:48:19 -07:00
Kun Wu
be2dd22b8f [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime
Differential Revision: https://reviews.llvm.org/D153173
2023-06-30 21:52:34 +00:00
Kun Wu
7a3ebba9cb [mlir][sparse][gpu] Add explaining string to three static_assert stmts
Differential Revision: https://reviews.llvm.org/D154243
2023-06-30 14:10:45 -05:00
Kun Wu
632ccc538c [mlir][sparse][gpu] remove tuple as one of the spmm_buffer_size output type
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D153188
2023-06-19 15:57:50 +00:00
Kun Wu
9167dd46ba [mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151582
2023-06-15 23:48:11 +00:00
Kun Wu
ac30f48e37 [mlir][sparse][gpu]fix various cusparseLt bugs
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152489
2023-06-12 23:48:49 +00:00
Navdeep Katel
18cc07aa07 [MLIR][GPU] Add 16-bit version of cudaMemset in cudaRuntimeWrappers
Add 16-bit version of cudaMemset in cudaRuntimeWrappers and update the GPU to LLVM lowering.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D151642
2023-06-08 17:33:26 +05:30
Aart Bik
50db4789a8 [mlir][sparse][gpu] refined build setup for cusparse
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152387
2023-06-07 11:09:22 -07:00
Kun Wu
8ed59c53de [mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151775
2023-06-06 23:13:21 +00:00
Aart Bik
9fc02a7a08 [mlir][sparse][gpu] add AoS COO support to cuSPARSE
Even though this feature was deprecated in release 11.2,
any library before this version still supports the feature,
which is why we are making it available under a macro.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152290
2023-06-06 12:32:46 -07:00
Kun Wu
7e44f0736a [mlir][gpu][sparse] fix broken type in cusparseCreateCsr
Differential Revision: https://reviews.llvm.org/D151912
2023-06-01 18:06:09 +00:00
Kun Wu
be6c532005 [mlir][sparse][gpu] fixing broken literal names in cuda runner macros
Differential Revision: https://reviews.llvm.org/D151910
2023-06-01 17:52:58 +00:00
Kun Wu
cc402de0b1 [mlir][sparse][gpu] add result type to spmv and spmm gpu libgen path
Differential Revision: https://reviews.llvm.org/D151592
2023-06-01 17:17:40 +00:00
Aart Bik
752c04777f [mlir][sparse][gpu] fix merge conflict
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D151619
2023-05-27 13:42:20 -07:00
Kun Wu
cf44847b4d [mlir][gpu][sparse] adding cusparse sddmm support
Differential Revision: https://reviews.llvm.org/D151279
2023-05-27 20:01:41 +00:00
Aart Bik
74e29d3715 [mlir][sparse][gpu] fix merge conflict
Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151574
2023-05-26 11:00:20 -07:00
Kun Wu
235fbe792b [mlir] [sparse] [gpu] adding transpose support to spmm spmv
Reviewed By: aartbik, wrengr

Differential Revision: https://reviews.llvm.org/D151259
2023-05-26 17:07:09 +00:00
Aart Bik
bcb698bfdc [mlir][sparse][gpu] various cuSparse refinements
(1) keep all cuSparse ops on single stream without wait() in right order
(2) use more type precise memref types for COO
(3) use ToTensor on resulting memref (even though it folds away again)

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D151404
2023-05-24 22:32:52 -07:00
Aart Bik
4ebd836d9e [mlir][sparse][gpu] fix F32 bug for SpMV and SpMM
The alpha/beta variables, residing on the host, should have the
32-bit or 64-bit width of the result type. It was formerly always
passed as double.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151255
2023-05-23 17:36:03 -07:00
Aart Bik
a8e1f80f8b [mlir][sparse][gpu] derive type of cuSparse op
This no longer assumes just F64 output.

Note, however, that it will be cleaner to carry the data type in the corresponding operation (rather than tracking operands). That will also allow for mixed type cases, where operands and result type are different

This will be done in a follow revision where the result type is carried by the SpMV/SpMM op itself (and friends).

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151005
2023-05-19 17:07:52 -07:00
Aart Bik
981cf1678d [mlir][sparse][gpu] add SpMM to GPU ops dialect
Reviewed By: ThomasRaoux, K-Wu

Differential Revision: https://reviews.llvm.org/D150618
2023-05-19 12:46:11 -07:00
Aart Bik
b700a90cc0 [mlir][gpu][sparse] add gpu ops for sparse matrix computations
This revision extends the GPU dialect with ops that can be lowered to
host-oriented sparse matrix library calls (in this case cuSparse focused
although the ops could be generalized to support more GPUs in principle).
This will allow the "sparse compiler pipeline" to accelerate sparse operations
(see follow up revisions with examples of this).

For some background;

https://discourse.llvm.org/t/sparse-compiler-and-gpu-code-generation/69786/2

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D150152
2023-05-12 10:44:36 -07:00
max
8f7c8a6ea7 Add gpu::HostUnregisterOp
Without explicitly unregistering you will get

```
'cuMemHostRegister(ptr, sizeBytes, 0)' failed with 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'
```

in CUDA (for example) after repeated runs (e.g., during benchmarking the same kernel).

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D147277
2023-04-06 15:07:12 -05:00
Mehdi Amini
6b7e6ea489 Revert "Fix CUDA runtime wrapper for GPU mem alloc/free to async"
This reverts commit b4117fede20b8c649320ad37364ae208baa0d0e7.
This broke one of the MLIR bot, a test is failing.
2022-04-12 06:50:27 +00:00
Uday Bondhugula
b4117fede2 Fix CUDA runtime wrapper for GPU mem alloc/free to async
Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.

Reviewed By: csigg

Differential Revision: https://reviews.llvm.org/D123482
2022-04-12 09:04:02 +05:30
Krzysztof Drewniak
c5803ee4fa [MLIR][GPU] Remove call to cudaSetDevice(), which no longer exists
Differential Revision: https://reviews.llvm.org/D120085
2022-02-17 21:38:05 +00:00
Krzysztof Drewniak
84718d37db [MLIR][GPU] Add gpu.set_default_device op
This op is added to allow MLIR code running on multi-GPU systems to
select the GPU they want to execute operations on when no GPU is
otherwise specified.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D119883
2022-02-17 21:30:09 +00:00
Nicolas Vasilache
012c0cc7c3 [mlir] NFC - Avoid unused symbol in opt mode. 2021-10-14 11:26:33 +00:00
Loren Maggiore
361458b1ce [mlir] create gpu memset op
Create a gpu memset op and corresponding CUDA and ROCm wrappers.

Reviewed By: herhut, lorenrose1013

Differential Revision: https://reviews.llvm.org/D107548
2021-09-04 08:13:04 +02:00
Aart Bik
b9f87e24f2 [mlir] add missing include, fix broken build
Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D108873
2021-08-28 09:36:38 -07:00
Uday Bondhugula
4edc9e2acf [MLIR][GPU] Drop mgpuMemHostRegisterMemRef's dependence on LLVM Support
Drop mgpuMemHostRegisterMemRef's dependence on LLVM Support. This
method is the only one in CUDA runtime wrappers library that creates
a dependence on libLLVMSupport due to its use of SmallVector and
ArrayRef. The code can be as easily/compactly written without those ADT.
The dependence on LLVMSupport adds a significant amount of additional
complexity for external things that want to link this library in (both
statically or as a shared object) since libLLVMSupport includes numerous
other objects that are sensitive to C++ compiler version and ABI.

Differential Revision: https://reviews.llvm.org/D108684
2021-08-28 11:37:55 +05:30
Christian Sigg
f69d5a7fc7 [mlir] Initialize CUDA context lazily.
So we can remove the ignore-warning pragma again.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D97864
2021-03-04 13:07:56 +01:00
Christian Sigg
b6ac26fce5 [mlir] Silence -Wglobal-constructors error in CudaRuntimeWrapper.cpp
Until I have a better solution with dynamic initialization, to get
the nvidia build bot green again.
2021-03-03 13:48:03 +01:00