591 Commits

Author SHA1 Message Date
Adam Paszke
1c2a0768de
[MLIR][CUDA] Update export macros in CudaRuntimeWrappers (#73932)
This fixes a few issues present in the current version:
1) The macro doesn't enforce the default visibility on exported
   functions, causing compilation to fail when using
   `-fvisibility=hidden`
2) Not all functions are exported
3) Sometimes the macro ended up weirdly interleaved with `extern "C"`
   declarations
2023-11-30 14:57:39 +01:00
Fangrui Song
a3ef858968 [mlir,polly] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC 2023-11-27 20:58:25 -08:00
Aart Bik
1944c4f76b
[mlir][sparse] rename DimLevelType to LevelType (#73561)
The "Dim" prefix is a legacy left-over that no longer makes sense, since
we have a very strict "Dimension" vs. "Level" definition for sparse
tensor types and their storage.
2023-11-27 14:27:52 -08:00
Guray Ozen
f21a70f9fe
[mlir][cuda] Guard mgpuLaunchClusterKernel for Cuda 12.0+ (NFC) (#73495) 2023-11-27 11:50:46 +01:00
Guray Ozen
edf5cae739
[mlir][gpu] Support Cluster of Thread Blocks in gpu.launch_func (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.

This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.

The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.

An example of this implementation can be seen below:

```
gpu.launch_func @kernel_module::@kernel
                clusters in (%1, %0, %0) // <-- Optional
                blocks in (%0, %0, %0)
                threads in (%0, %0, %0)
```

The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:

```
%cidX = gpu.cluster_id  x
%cidY = gpu.cluster_id  y
%cidZ = gpu.cluster_id  z

%cdimX = gpu.cluster_dim  x
%cdimY = gpu.cluster_dim  y
%cdimZ = gpu.cluster_dim  z
```

We will introduce cluster support in `gpu.launch` Op in an upcoming PR. 

See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
2023-11-27 11:05:07 +01:00
Ivan Butygin
0bda20b8be
Reland [mlir] Workaround for export lib generation on Windows for mlir_arm_sme_abi_stubs #73147 (#73238)
https://github.com/llvm/llvm-project/pull/73147

Fixed the visibility macro
2023-11-23 16:59:17 +03:00
Ivan Butygin
bf353a71a2 Revert "[mlir] Workaround for export lib generation on Windows for mlir_arm_sme_abi_stubs (#73147)"
This reverts commit 6248c24876d81d83544af02399d46813dbea869c.

broke the bots
2023-11-23 13:32:42 +01:00
Ivan Butygin
6248c24876
[mlir] Workaround for export lib generation on Windows for mlir_arm_sme_abi_stubs (#73147)
Using mlir cmake in downstream project fails with error
```
CMake Error at D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRTargets.cmake:2537 (message):
  The imported target "mlir_arm_sme_abi_stubs" references the file

     "D:/projs/llvm/llvm-install/lib/mlir_arm_sme_abi_stubs.lib"

  but this file does not exist.  Possible reasons include:

  * The file was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and contained

     "D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRTargets.cmake"

  but not all the files it references.

Call Stack (most recent call first):
  D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRConfig.cmake:37 (include)
  mlir/CMakeLists.txt:5 (find_package)
```

Windows cmake needs export libaries but it seems they are only being
generated if you have at least one exported symbol.
Add export attributes to symbols.

Not sure what the best approach to fix this (probably we should just
disable this lib on windows entirely), but it fixed things for me
locally.
2023-11-23 15:23:01 +03:00
Benjamin Maxwell
783ac3b6fb
[mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044)
Previously, we were inserting za.enable/disable intrinsics for functions
with the "arm_za" attribute (at the MLIR level), rather than using the
backend attributes. This was done to avoid a dependency on the SME ABI
functions from compiler-rt (which have only recently been implemented).

Doing things this way did have correctness issues, for example, calling
a streaming-mode function from another streaming-mode function (both
with ZA enabled) would lead to ZA being disabled after returning to the
caller (where it should still be enabled). Fixing issues like this would
require re-doing the ABI work already done in the backend within MLIR.

Instead, this patch switches to use the "arm_new_za" (backend) attribute
for enabling ZA for an MLIR function. For the integration tests, this
requires some way of linking the SME ABI functions. This is done via the
`%arm_sme_abi_shlib` lit substitution. By default, this expands to a
stub implementation of the SME ABI functions, but this can be overridden
by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable
(pointing it at an alternative implementation). For now, the ArmSME
integration tests pass with just stubs, as we don't make use of nested
ZA-enabled calls.

A future patch may add an option to compiler-rt to build the SME
builtins into a standalone shared library to allow easily
building/testing with the actual implementation.
2023-11-14 12:50:38 +00:00
Aart Bik
4f183b1f6e
[mlir][sparse] remove obsoleted output methods from runtime (#70523)
Our CODE and LIB are more unified every day!
2023-10-27 16:58:41 -07:00
Youngsuk Kim
645b7795d4 [mlir] Remove no-op ptr-to-ptr bitcasts (NFC)
Opaque pointer cleanup effort. NFC.
2023-10-26 13:01:23 -05:00
Nishant Patel
7fa19e6f4b
[MLIR] Add SyclRuntimeWrapper (#69648) 2023-10-26 19:41:09 +02:00
Benjamin Maxwell
274ce8895b
[mlir] Remove printCString() from RunnerUtils (#70197)
This is now unused and can be replaced with `printString()` from
CRunnerUtils or `vector.print str`.
2023-10-26 10:07:23 +01:00
Guray Ozen
5ef45c02dc
[mlir][cuda] Avoid driver call to check max shared memory (#70021)
This PR guards the driver call with if-statement as the driver calls are
more expensive.

As a future todo, the if statement could be generated by the compiler
and thus optimized in some cases.
2023-10-26 11:02:32 +03:00
Benjamin Maxwell
3be3883e6d
[mlir][VectorOps] Support string literals in vector.print (#68695)
Printing strings within integration tests is currently quite annoyingly
verbose, and can't be tucked into shared helpers as the types depend on
the length of the string:

```
llvm.mlir.global internal constant @hello_world("Hello, World!\0")

func.func @entry() {
  %0 = llvm.mlir.addressof @hello_world : !llvm.ptr<array<14 x i8>>
  %1 = llvm.mlir.constant(0 : index) : i64
  %2 = llvm.getelementptr %0[%1, %1]
    : (!llvm.ptr<array<14 x i8>>, i64, i64) -> !llvm.ptr<i8>
  llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> ()
  return
}
```

So this patch adds a simple extension to `vector.print` to simplify
this:
```
func.func @entry() {
   // Print a vector of characters ;)
   vector.print str "Hello, World!"
   return
}
```

Most of the logic for this is now shared with `cf.assert` which already
does something similar.

Depends on #68694
2023-10-24 09:34:14 +01:00
Aart Bik
e6005d5a9c
[mlir][sparse] support 2:4 structured sparsity and loose compressed (#69968)
This adds library support for these two new level formats.
2023-10-23 15:34:45 -07:00
Kazu Hirata
5a98dd6734 [mlir] Remove an extraneous typename (NFC) 2023-10-22 10:42:16 -07:00
Brad Smith
a157a82b1e [mlir] Avoid including <alloca.h> on DragonFly 2023-10-21 01:19:34 -04:00
Aart Bik
48962383ad
[mlir][sparse] tiny cleanup making local 'using' explicit (#69740) 2023-10-20 12:41:08 -07:00
Aart Bik
306f4c306a
[mlir][sparse] implement non-permutation MapRef encoding (#69406)
This enables reading block sparse from file using libgen! (and soon also
direct IR codegen)
2023-10-18 13:01:12 -07:00
Aart Bik
d816c221b4
[mlir][sparse] complete migration to dim2lvl/lvl2dim in library (#69268)
This last revision completed the migration to non-permutation support in
the SparseTensor library. All mappings are now controlled by the MapRef
(forward and backward). Unused code has been removed, which simplifies
subsequent testing of block sparsity.
2023-10-17 09:32:22 -07:00
Aart Bik
233c3e6c53
[mlir][sparse] remove sparse2sparse path in library (#69247)
This cleans up all external entry points that will have to deal with
non-permutations, making any subsequent refactoring much more local to
the lib files.
2023-10-16 14:45:57 -07:00
Aart Bik
d392073f67
[mlir][sparse] simplify reader construction of new sparse tensor (#69036)
Making the materialize-from-reader method part of the Swiss army knife
suite again removes a lot of redundant boiler plate code and unifies the
parameter setup into a single centralized utility. Furthermore, we now
have minimized the number of entry points into the library that need a
non-permutation map setup, simplifying what comes next
2023-10-16 10:25:37 -07:00
Aart Bik
9bd5bfc689
[mlir][sparse] remove unused sparse tensor iterator (#68951) 2023-10-12 22:51:07 -07:00
Aart Bik
2045cca0c3
[mlir][sparse] add a forwarding insertion to SparseTensorStorage (#68939) 2023-10-12 21:03:07 -07:00
Peiming Liu
f248d0b28d
[mlir][sparse] implement sparse_tensor.reorder_coo (#68916)
As a side effect of the change, it also unifies the convertOp
implementation between lib/codegen path.
2023-10-12 13:22:45 -07:00
Aart Bik
db1d40f319
[mlir][sparse] refactor dim2lvl/lvl2dim passing into MapRef (#68649)
This revision refactors all "swiss army knife" entry points to pass
dim2lvl/lvl2dim mapping, so that the callee can construct a MapRef
(shown for SparseTensorStorage class). This is a next step towards
completely centralizing mapping code into a single MapRef class.
2023-10-11 09:15:07 -07:00
Aart Bik
ab6334dd11
[mlir][sparse] add expanded size to API (#68614)
Used for asserting we do not run out of bounds on the expanded access
pattern.
2023-10-09 14:42:11 -07:00
Aart Bik
b7188d2877
[mlir][sparse] replace specialized buffer setup with util code (#68461)
This completely centralizes all set up related to dim2lvl and lvl2dim
for the runtime library (and even parts of direct IR codegen) into one
place! And all comptatible with the MapRef data structure that should be
used in all remaining clients of dim2lvl and lvl2dim.

NOTE: the convert_x2y.mlir tests were becoming too overloaded
      so I decided to bring them back to the basics; if e.g.
      more coverage of the foreach is required, they should
      go into isolated smalle tests
2023-10-09 08:50:59 -07:00
Aart Bik
d3af65358d
[mlir][sparse] introduce MapRef, unify conversion/codegen for reader (#68360)
This revision introduces a MapRef, which will support a future
generalization beyond permutations (e.g. block sparsity). This revision
also unifies the conversion/codegen paths for the sparse_tensor.new
operation from file (eg. the readers). Note that more unification is
planned as well as general affine dim2lvl and lvl2dim (all marked with
TODOs).
2023-10-06 13:42:01 -07:00
Aart Bik
427f120f60
[mlir][sparse] minor edits in runtime lib Cpp files (#68165) 2023-10-03 16:28:54 -07:00
JOE1994
204883623e [NFC] Replace uses of Type::getPointerTo
Replace some uses of `Type::getPointerTo` via 2 ways
* Remove entirely if it's only used to support an unnecessary bitcast
  (remove the bitcast as well).
* Replace with `PointerType::get`/`PointerType::getUnqual`

NFC opaque pointer clean-up effort.
2023-09-29 21:38:53 -04:00
Aart Bik
7ac330a461
[mlir][sparse][gpu] protect BSR method with cuda 12.1 (#67728)
MLIR official build is not quite at 12.1 yet, so until then we protext
the Bsr method with a macro guard
2023-09-28 12:58:01 -07:00
Aart Bik
39038177ee
[mlir][sparse][gpu] add CSC and BSR format to cuSparse GPU ops (#67509)
This adds two cuSparse formats to the GPU dialect support. Together with
proper lowering and runtime cuda support. Also fixes a few minor
omissions.
2023-09-27 09:32:25 -07:00
Nishant Patel
1002a1d058
[MLIR] Pass hostShared flag in gpu.alloc op to runtime wrappers (#66401)
This PR is a breakdown of the big PR
https://github.com/llvm/llvm-project/pull/65539 which enables intel gpu
integration. In this PR we pass hostShared flag to runtime wrappers
(required by SyclRuntimeWrappers which will come in subsequent PR) to
indicate if the allocation is done on host shared gpu memory or device
only memory.
2023-09-26 15:32:11 -07:00
Nishant Patel
ebfea261e6
[MLIR] Pass count of parameters & gpu binary size to runtime wrappers (#66154)
This PR is a breakdown of the big PR #65539 which enables intel gpu
integration. In this PR we pass count of parameters and size of gpu
binary to runtime wrappers since the SyclRuntimeWrappers (which will
come in subsequent PR) requires the spirv size for compilation and also
the number of parameters to iterate over the params.
2023-09-26 11:27:07 -07:00
Guray Ozen
2379432c88
[MLIR] Correct Initial TMA Descriptor Values (#67397) 2023-09-26 09:20:46 +02:00
Aart Bik
8998bcfbce
[mlir][sparse][gpu] refine type of workspace size variables (#66438)
Rationale:
Some compiler settings don't like the size_t vs uint64_t setup.
2023-09-14 15:49:52 -07:00
Fabian Mora
5093413a50
[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220)
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.

2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.

3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.

4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
2023-09-14 18:00:27 -04:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Aart Bik
156a4ba9b4
[mlir][sparse] deprecate the convert{To,From}MLIRSparseTensor methods (#66304)
Rationale:
These libraries provided COO input and output at external boundaries
which, since then, has been generalized to the much more powerful pack
and unpack operations of the sparse tensor dialect.
2023-09-14 10:02:29 -07:00
Aart Bik
3635c74375
[mlir][gpu][sparse] gracefully accept zero size allocation (#66127)
This cleans up a unnecessary code that changes zero size allocation to
avoid the following error message

'cuMemAlloc(&ptr, sizeBytes)' failed with 'CUDA_ERROR_INVALID_VALUE'
2023-09-12 13:07:24 -07:00
Guray Ozen
1dc0071216 [MLIR] Guard Cuda 12.0+ newer driver APIs with CUDA_VERSION macro checks
Fixes #64529
https://github.com/llvm/llvm-project/issues/64529

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D159440
2023-09-06 08:17:06 +02:00
Aart Bik
9ce445b8c7 [mlir][sparse] simplification of sparse runtime support lib
Incorporated two header files directly into other since
other parts were used (and it makes it hard to find the
definitions). Removed TODOs that are less likely to be done.

Reviewed By: yinying-lisa-li

Differential Revision: https://reviews.llvm.org/D159381
2023-09-01 14:00:19 -07:00
Mehdi Amini
471004c5c9 Revert "[mlir][sparse] simplification of sparse runtime support lib"
This reverts commit 14c58cf5c39a39a335893bc98493c5edc75a91b3.

The gcc7 build is broken.
2023-09-01 11:50:14 -07:00
Aart Bik
14c58cf5c3 [mlir][sparse] simplification of sparse runtime support lib
Incorporated two header files directly into other since
other parts were used (and it makes it hard to find the
definitions). Removed TODOs that are less likely to be done.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D159330
2023-09-01 09:28:48 -07:00
Aart Bik
b86d3cbc12 [mlir][sparse] complete various FIXMEs in sparse support lib
Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D159245
2023-08-30 21:30:25 -07:00
Peiming Liu
fa6726e27b [mlir][sparse] supports sparse_tensor.pack on libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D158012
2023-08-15 20:20:54 +00:00
Aart Bik
289f7231f9 [mlir][sparse][gpu] minor code cleanup for sparse gpu ops
Consistent order of ops and related methods.
Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp
since this is a general utility for sparse matrices,
not specific to GEMM ops only.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D157922
2023-08-14 15:08:57 -07:00
Aart Bik
95a6c509c9 [mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugs
Rationale:
Since we only support default algorithm for SpGEMM, we can remove the
estimate op (for now at least). This also introduces the set csr pointers
op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown.
This revision paves the way for actual recognition of SpGEMM in the sparsifier.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D157645
2023-08-10 13:52:47 -07:00