llvm-project

History

Aart Bik ccd047cba4 [mlir][sparse] optimize COO index handling

By using a shared index pool, we reduce the footprint of each "Element"
in the COO scheme and, in addition, reduce the overhead of allocating
indices (trading many allocations of vectors for allocations in a single
vector only). When the capacity is known, this means *all* allocation
can be done in advance.

This is a big win. For example, reading matrix SK-2005, with dimensions
50,636,154 x 50,636,154 and 1,949,412,601 nonzero elements improves
as follows (time in ms), or about 3.5x faster overall

```
SK-2005 before        after      speedup
  ---------------------------------------------
read     305,086.65    180,318.12    1.69
sort   2,836,096.23    510,492.87    5.56
pack     364,485.67    312,009.96    1.17
  ---------------------------------------------
TOTAL  3,505,668.56  1,002,820.95    3.50
```

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D124502

2022-04-27 10:20:47 -07:00

AsyncRuntime.cpp

[async] Get the number of worker threads from the runtime.

2022-01-31 12:06:01 -08:00

CMakeLists.txt

Reland [mlir] Remove uses of LLVM's legacy pass manager

2022-04-11 16:53:32 -07:00

CRunnerUtils.cpp

[mlir] Add msan memory unpoisoning macros to mlir ExecutionEngine

2022-04-11 18:58:28 -07:00

CudaRuntimeWrappers.cpp

Revert "Fix CUDA runtime wrapper for GPU mem alloc/free to async"