llvm-project

Author	SHA1	Message	Date
Adam Paszke	1c2a0768de	[MLIR][CUDA] Update export macros in CudaRuntimeWrappers (#73932 ) This fixes a few issues present in the current version: 1) The macro doesn't enforce the default visibility on exported functions, causing compilation to fail when using `-fvisibility=hidden` 2) Not all functions are exported 3) Sometimes the macro ended up weirdly interleaved with `extern "C"` declarations	2023-11-30 14:57:39 +01:00
Fangrui Song	a3ef858968	[mlir,polly] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC	2023-11-27 20:58:25 -08:00
Aart Bik	1944c4f76b	[mlir][sparse] rename DimLevelType to LevelType (#73561 ) The "Dim" prefix is a legacy left-over that no longer makes sense, since we have a very strict "Dimension" vs. "Level" definition for sparse tensor types and their storage.	2023-11-27 14:27:52 -08:00
Guray Ozen	f21a70f9fe	[mlir][cuda] Guard mgpuLaunchClusterKernel for Cuda 12.0+ (NFC) (#73495 )	2023-11-27 11:50:46 +01:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Ivan Butygin	0bda20b8be	Reland [mlir] Workaround for export lib generation on Windows for mlir_arm_sme_abi_stubs #73147 (#73238 ) https://github.com/llvm/llvm-project/pull/73147 Fixed the visibility macro	2023-11-23 16:59:17 +03:00
Ivan Butygin	bf353a71a2	Revert "[mlir] Workaround for export lib generation on Windows for `mlir_arm_sme_abi_stubs` (#73147 )" This reverts commit 6248c24876d81d83544af02399d46813dbea869c. broke the bots	2023-11-23 13:32:42 +01:00
Ivan Butygin	6248c24876	[mlir] Workaround for export lib generation on Windows for `mlir_arm_sme_abi_stubs` (#73147 ) Using mlir cmake in downstream project fails with error ``` CMake Error at D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRTargets.cmake:2537 (message): The imported target "mlir_arm_sme_abi_stubs" references the file "D:/projs/llvm/llvm-install/lib/mlir_arm_sme_abi_stubs.lib" but this file does not exist. Possible reasons include: * The file was deleted, renamed, or moved to another location. * An install or uninstall procedure did not complete successfully. * The installation package was faulty and contained "D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRTargets.cmake" but not all the files it references. Call Stack (most recent call first): D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRConfig.cmake:37 (include) mlir/CMakeLists.txt:5 (find_package) ``` Windows cmake needs export libaries but it seems they are only being generated if you have at least one exported symbol. Add export attributes to symbols. Not sure what the best approach to fix this (probably we should just disable this lib on windows entirely), but it fixed things for me locally.	2023-11-23 15:23:01 +03:00
Benjamin Maxwell	783ac3b6fb	[mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044 ) Previously, we were inserting za.enable/disable intrinsics for functions with the "arm_za" attribute (at the MLIR level), rather than using the backend attributes. This was done to avoid a dependency on the SME ABI functions from compiler-rt (which have only recently been implemented). Doing things this way did have correctness issues, for example, calling a streaming-mode function from another streaming-mode function (both with ZA enabled) would lead to ZA being disabled after returning to the caller (where it should still be enabled). Fixing issues like this would require re-doing the ABI work already done in the backend within MLIR. Instead, this patch switches to use the "arm_new_za" (backend) attribute for enabling ZA for an MLIR function. For the integration tests, this requires some way of linking the SME ABI functions. This is done via the `%arm_sme_abi_shlib` lit substitution. By default, this expands to a stub implementation of the SME ABI functions, but this can be overridden by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable (pointing it at an alternative implementation). For now, the ArmSME integration tests pass with just stubs, as we don't make use of nested ZA-enabled calls. A future patch may add an option to compiler-rt to build the SME builtins into a standalone shared library to allow easily building/testing with the actual implementation.	2023-11-14 12:50:38 +00:00
Aart Bik	4f183b1f6e	[mlir][sparse] remove obsoleted output methods from runtime (#70523 ) Our CODE and LIB are more unified every day!	2023-10-27 16:58:41 -07:00
Youngsuk Kim	645b7795d4	[mlir] Remove no-op ptr-to-ptr bitcasts (NFC) Opaque pointer cleanup effort. NFC.	2023-10-26 13:01:23 -05:00
Nishant Patel	7fa19e6f4b	[MLIR] Add SyclRuntimeWrapper (#69648 )	2023-10-26 19:41:09 +02:00
Benjamin Maxwell	274ce8895b	[mlir] Remove `printCString()` from RunnerUtils (#70197 ) This is now unused and can be replaced with `printString()` from CRunnerUtils or `vector.print str`.	2023-10-26 10:07:23 +01:00
Guray Ozen	5ef45c02dc	[mlir][cuda] Avoid driver call to check max shared memory (#70021 ) This PR guards the driver call with if-statement as the driver calls are more expensive. As a future todo, the if statement could be generated by the compiler and thus optimized in some cases.	2023-10-26 11:02:32 +03:00
Benjamin Maxwell	3be3883e6d	[mlir][VectorOps] Support string literals in `vector.print` (#68695 ) Printing strings within integration tests is currently quite annoyingly verbose, and can't be tucked into shared helpers as the types depend on the length of the string: ``` llvm.mlir.global internal constant @hello_world("Hello, World!\0") func.func @entry() { %0 = llvm.mlir.addressof @hello_world : !llvm.ptr<array<14 x i8>> %1 = llvm.mlir.constant(0 : index) : i64 %2 = llvm.getelementptr %0[%1, %1] : (!llvm.ptr<array<14 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> () return } ``` So this patch adds a simple extension to `vector.print` to simplify this: ``` func.func @entry() { // Print a vector of characters ;) vector.print str "Hello, World!" return } ``` Most of the logic for this is now shared with `cf.assert` which already does something similar. Depends on #68694	2023-10-24 09:34:14 +01:00
Aart Bik	e6005d5a9c	[mlir][sparse] support 2:4 structured sparsity and loose compressed (#69968 ) This adds library support for these two new level formats.	2023-10-23 15:34:45 -07:00
Kazu Hirata	5a98dd6734	[mlir] Remove an extraneous typename (NFC)	2023-10-22 10:42:16 -07:00
Brad Smith	a157a82b1e	[mlir] Avoid including <alloca.h> on DragonFly	2023-10-21 01:19:34 -04:00
Aart Bik	48962383ad	[mlir][sparse] tiny cleanup making local 'using' explicit (#69740 )	2023-10-20 12:41:08 -07:00
Aart Bik	306f4c306a	[mlir][sparse] implement non-permutation MapRef encoding (#69406 ) This enables reading block sparse from file using libgen! (and soon also direct IR codegen)	2023-10-18 13:01:12 -07:00
Aart Bik	d816c221b4	[mlir][sparse] complete migration to dim2lvl/lvl2dim in library (#69268 ) This last revision completed the migration to non-permutation support in the SparseTensor library. All mappings are now controlled by the MapRef (forward and backward). Unused code has been removed, which simplifies subsequent testing of block sparsity.	2023-10-17 09:32:22 -07:00
Aart Bik	233c3e6c53	[mlir][sparse] remove sparse2sparse path in library (#69247 ) This cleans up all external entry points that will have to deal with non-permutations, making any subsequent refactoring much more local to the lib files.	2023-10-16 14:45:57 -07:00
Aart Bik	d392073f67	[mlir][sparse] simplify reader construction of new sparse tensor (#69036 ) Making the materialize-from-reader method part of the Swiss army knife suite again removes a lot of redundant boiler plate code and unifies the parameter setup into a single centralized utility. Furthermore, we now have minimized the number of entry points into the library that need a non-permutation map setup, simplifying what comes next	2023-10-16 10:25:37 -07:00
Aart Bik	9bd5bfc689	[mlir][sparse] remove unused sparse tensor iterator (#68951 )	2023-10-12 22:51:07 -07:00
Aart Bik	2045cca0c3	[mlir][sparse] add a forwarding insertion to SparseTensorStorage (#68939 )	2023-10-12 21:03:07 -07:00
Peiming Liu	f248d0b28d	[mlir][sparse] implement sparse_tensor.reorder_coo (#68916 ) As a side effect of the change, it also unifies the convertOp implementation between lib/codegen path.	2023-10-12 13:22:45 -07:00
Aart Bik	db1d40f319	[mlir][sparse] refactor dim2lvl/lvl2dim passing into MapRef (#68649 ) This revision refactors all "swiss army knife" entry points to pass dim2lvl/lvl2dim mapping, so that the callee can construct a MapRef (shown for SparseTensorStorage class). This is a next step towards completely centralizing mapping code into a single MapRef class.	2023-10-11 09:15:07 -07:00
Aart Bik	ab6334dd11	[mlir][sparse] add expanded size to API (#68614 ) Used for asserting we do not run out of bounds on the expanded access pattern.	2023-10-09 14:42:11 -07:00
Aart Bik	b7188d2877	[mlir][sparse] replace specialized buffer setup with util code (#68461 ) This completely centralizes all set up related to dim2lvl and lvl2dim for the runtime library (and even parts of direct IR codegen) into one place! And all comptatible with the MapRef data structure that should be used in all remaining clients of dim2lvl and lvl2dim. NOTE: the convert_x2y.mlir tests were becoming too overloaded so I decided to bring them back to the basics; if e.g. more coverage of the foreach is required, they should go into isolated smalle tests	2023-10-09 08:50:59 -07:00
Aart Bik	d3af65358d	[mlir][sparse] introduce MapRef, unify conversion/codegen for reader (#68360 ) This revision introduces a MapRef, which will support a future generalization beyond permutations (e.g. block sparsity). This revision also unifies the conversion/codegen paths for the sparse_tensor.new operation from file (eg. the readers). Note that more unification is planned as well as general affine dim2lvl and lvl2dim (all marked with TODOs).	2023-10-06 13:42:01 -07:00
Aart Bik	427f120f60	[mlir][sparse] minor edits in runtime lib Cpp files (#68165 )	2023-10-03 16:28:54 -07:00
JOE1994	204883623e	[NFC] Replace uses of Type::getPointerTo Replace some uses of `Type::getPointerTo` via 2 ways * Remove entirely if it's only used to support an unnecessary bitcast (remove the bitcast as well). * Replace with `PointerType::get`/`PointerType::getUnqual` NFC opaque pointer clean-up effort.	2023-09-29 21:38:53 -04:00
Aart Bik	7ac330a461	[mlir][sparse][gpu] protect BSR method with cuda 12.1 (#67728 ) MLIR official build is not quite at 12.1 yet, so until then we protext the Bsr method with a macro guard	2023-09-28 12:58:01 -07:00
Aart Bik	39038177ee	[mlir][sparse][gpu] add CSC and BSR format to cuSparse GPU ops (#67509 ) This adds two cuSparse formats to the GPU dialect support. Together with proper lowering and runtime cuda support. Also fixes a few minor omissions.	2023-09-27 09:32:25 -07:00
Nishant Patel	1002a1d058	[MLIR] Pass hostShared flag in gpu.alloc op to runtime wrappers (#66401 ) This PR is a breakdown of the big PR https://github.com/llvm/llvm-project/pull/65539 which enables intel gpu integration. In this PR we pass hostShared flag to runtime wrappers (required by SyclRuntimeWrappers which will come in subsequent PR) to indicate if the allocation is done on host shared gpu memory or device only memory.	2023-09-26 15:32:11 -07:00
Nishant Patel	ebfea261e6	[MLIR] Pass count of parameters & gpu binary size to runtime wrappers (#66154 ) This PR is a breakdown of the big PR #65539 which enables intel gpu integration. In this PR we pass count of parameters and size of gpu binary to runtime wrappers since the SyclRuntimeWrappers (which will come in subsequent PR) requires the spirv size for compilation and also the number of parameters to iterate over the params.	2023-09-26 11:27:07 -07:00
Guray Ozen	2379432c88	[MLIR] Correct Initial TMA Descriptor Values (#67397 )	2023-09-26 09:20:46 +02:00
Aart Bik	8998bcfbce	[mlir][sparse][gpu] refine type of workspace size variables (#66438 ) Rationale: Some compiler settings don't like the size_t vs uint64_t setup.	2023-09-14 15:49:52 -07:00
Fabian Mora	5093413a50	[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220 ) This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.	2023-09-14 18:00:27 -04:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Aart Bik	156a4ba9b4	[mlir][sparse] deprecate the convert{To,From}MLIRSparseTensor methods (#66304 ) Rationale: These libraries provided COO input and output at external boundaries which, since then, has been generalized to the much more powerful pack and unpack operations of the sparse tensor dialect.	2023-09-14 10:02:29 -07:00
Aart Bik	3635c74375	[mlir][gpu][sparse] gracefully accept zero size allocation (#66127 ) This cleans up a unnecessary code that changes zero size allocation to avoid the following error message 'cuMemAlloc(&ptr, sizeBytes)' failed with 'CUDA_ERROR_INVALID_VALUE'	2023-09-12 13:07:24 -07:00
Guray Ozen	1dc0071216	[MLIR] Guard Cuda 12.0+ newer driver APIs with CUDA_VERSION macro checks Fixes #64529 https://github.com/llvm/llvm-project/issues/64529 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D159440	2023-09-06 08:17:06 +02:00
Aart Bik	9ce445b8c7	[mlir][sparse] simplification of sparse runtime support lib Incorporated two header files directly into other since other parts were used (and it makes it hard to find the definitions). Removed TODOs that are less likely to be done. Reviewed By: yinying-lisa-li Differential Revision: https://reviews.llvm.org/D159381	2023-09-01 14:00:19 -07:00
Mehdi Amini	471004c5c9	Revert "[mlir][sparse] simplification of sparse runtime support lib" This reverts commit 14c58cf5c39a39a335893bc98493c5edc75a91b3. The gcc7 build is broken.	2023-09-01 11:50:14 -07:00
Aart Bik	14c58cf5c3	[mlir][sparse] simplification of sparse runtime support lib Incorporated two header files directly into other since other parts were used (and it makes it hard to find the definitions). Removed TODOs that are less likely to be done. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D159330	2023-09-01 09:28:48 -07:00
Aart Bik	b86d3cbc12	[mlir][sparse] complete various FIXMEs in sparse support lib Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D159245	2023-08-30 21:30:25 -07:00
Peiming Liu	fa6726e27b	[mlir][sparse] supports sparse_tensor.pack on libgen path Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D158012	2023-08-15 20:20:54 +00:00
Aart Bik	289f7231f9	[mlir][sparse][gpu] minor code cleanup for sparse gpu ops Consistent order of ops and related methods. Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp since this is a general utility for sparse matrices, not specific to GEMM ops only. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D157922	2023-08-14 15:08:57 -07:00
Aart Bik	95a6c509c9	[mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugs Rationale: Since we only support default algorithm for SpGEMM, we can remove the estimate op (for now at least). This also introduces the set csr pointers op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown. This revision paves the way for actual recognition of SpGEMM in the sparsifier. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157645	2023-08-10 13:52:47 -07:00

1 2 3 4 5 ...

591 Commits