By using a shared index pool, we reduce the footprint of each "Element"
in the COO scheme and, in addition, reduce the overhead of allocating
indices (trading many allocations of vectors for allocations in a single
vector only). When the capacity is known, this means *all* allocation
can be done in advance.
This is a big win. For example, reading matrix SK-2005, with dimensions
50,636,154 x 50,636,154 and 1,949,412,601 nonzero elements improves
as follows (time in ms), or about 3.5x faster overall
```
SK-2005 before after speedup
---------------------------------------------
read 305,086.65 180,318.12 1.69
sort 2,836,096.23 510,492.87 5.56
pack 364,485.67 312,009.96 1.17
---------------------------------------------
TOTAL 3,505,668.56 1,002,820.95 3.50
```
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D124502
Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D123482
Adding annotations on as-needed bases, currently only for memrefCopy, but in general all C API functions that take pointers to memory allocated/initialized inside the jit-compiled code must be annotated, to be able to run with msan.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123557
Use the new pass manager.
This also removes the ability to run arbitrary sets of passes. Not sure if this functionality is used, but it doesn't seem to be tested.
No need to initialize passes outside of constructing the PassBuilder with the new pass manager.
Reland: Fixed custom calls to `-lower-matrix-intrinsics` in integration tests by replacing them with `-O0 -enable-matrix`.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123425
Use the new pass manager.
This also removes the ability to run arbitrary sets of passes. Not sure if this functionality is used, but it doesn't seem to be tested.
No need to initialize passes outside of constructing the PassBuilder with the new pass manager.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D123425
This avoids a rather big bug where we were reserving
dense space for the ptx/idx in the first sparse dimension.
For example, using CSR for a 140874 x 140874 matrix with
3977139 nonzero would reserve the full 19845483876 space.
This revision fixes this for now, but we need to revisit
the reservation heuristic to make this better.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D123166
This removes tens of warnings from build logs that we can't do
anything about.
Reviewed By: pcf000
Differential Revision: https://reviews.llvm.org/D122927
This change introduces two new methods: `finalizeSegment` and `appendIndex`; and removes three old methods: `endDim`, `appendCurrentPointer`, `appendIndex`. The two new methods better encapsulate their algorithms, thus allowing to remove repetitious code in several other places.
Depends On D122435
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D122625
Prior to this change there were a number of places where the allocation and deallocation of SparseTensorCOO objects were not cleanly paired, leading to inconsistencies regarding whether each function released its tensor/coo arguments or not, as well as making it easy to run afoul of memory leaks, use-after-free, or double-free errors. This change cleans up the codegen vs runtime boundary to resolve those issues. Now, the only time the runtime library frees an object is either (a) because it's a function explicitly designed to do so, or (b) because the allocated object is entirely local to the function and would be a memory leak if not released. Thus, now the codegen takes complete responsibility for releasing any objects it caused to be allocated.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D122435
I'm using "shape" to mean the compile-time object, where zeros indicate sizes which are compile-time dynamic; and using "sizes" to mean the run-time object, where zeros indicate a dimension with no coordinates (hence resulting in trivial storage). Because their semantics differ on zeros, it's important to keep them distinguished. Although we do not define separate C++ types to capture the distinction, we can at least use variable names to do so.
This is (tangential) work towards fixing: https://github.com/llvm/llvm-project/issues/51652
Depends On D122057
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D122058
This is the more logical place for the function to live. If/when we factor out a separate class for just the `Coordinates` themselves, then the definition should be moved to `Coordinates::lexOrder` (and `Element::lexOrder` would become a thin wrapper delegating to that function).
This is (tangentially) work towards fixing: https://github.com/llvm/llvm-project/issues/51652
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D122057
This removes any potential confusion with the `getType` accessors
which correspond to SSA results of an operation, and makes it
clear what the intent is (i.e. to represent the type of the function).
Differential Revision: https://reviews.llvm.org/D121762
The enableObjectCache option was added in
https://reviews.llvm.org/rG06e8101034e, defaulting to false. However,
the init code added there got its logic reversed
(cache(enableObjectCache ? nullptr : new SimpleObjectCache()), which was
fixed in https://reviews.llvm.org/rGd1186fcb04 by setting the default to
true, thereby preserving the existing behavior even if it was
unintentional.
Default now the object cache to false as it was originally intended.
While at it, mention in enableObjectCache's documentation how the
cache can be dumped.
Reviewed-by: mehdi_amini
Differential Revision: https://reviews.llvm.org/D121291
The current StandardToLLVM conversion patterns only really handle
the Func dialect. The pass itself adds patterns for Arithmetic/CFToLLVM, but
those should be/will be split out in a followup. This commit focuses solely
on being an NFC rename.
Aside from the directory change, the pattern and pass creation API have been renamed:
* populateStdToLLVMFuncOpConversionPattern -> populateFuncToLLVMFuncOpConversionPattern
* populateStdToLLVMConversionPatterns -> populateFuncToLLVMConversionPatterns
* createLowerToLLVMPass -> createConvertFuncToLLVMPass
Differential Revision: https://reviews.llvm.org/D120778
There is no reason for this file to be at the top-level, and
its current placement predates the Parser/ folder's existence.
Differential Revision: https://reviews.llvm.org/D121024
Mark `parseSourceFile()` deprecated. The functions will be removed two weeks after landing this change.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D121075
This clarifies that these methods only work in append mode, not for general insertions. This is a prospective change towards https://github.com/llvm/llvm-project/issues/51652 which also performs random-access insertions, so we want to avoid confusion.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120929
The last remaining operations in the standard dialect all revolve around
FuncOp/function related constructs. This patch simply handles the initial
renaming (which by itself is already huge), but there are a large number
of cleanups unlocked/necessary afterwards:
* Removing a bunch of unnecessary dependencies on Func
* Cleaning up the From/ToStandard conversion passes
* Preparing for the move of FuncOp to the Func dialect
See the discussion at https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061
Differential Revision: https://reviews.llvm.org/D120624
Previously, convertToMLIRSparseTensor assumes identity storage ordering and all
compressed dimensions. This change extends the function with two parameters for
users to specify the storage ordering and the sparsity of each dimension.
Modify PyTACO to reflect this change.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120643
By specifying a sectionMemoryMapper, users can control how
memory for JIT code is allocated.
In particular, I need this in order to use a named memory
region so that profilers such as perf(1) can correctly label
execution cycles coming from JIT'ed code.
Reviewed-by: ezhulenev
Differential Revision: https://reviews.llvm.org/D120415
Its number of optional parameters has grown too large,
which makes adding new optional parameters quite a chore.
Fix this by using an options struct.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D120380
These routines will need to be specialized a lot more based on value types,
index types, pointer types, and permutation/dimension ordering. This is a
careful first step, providing some functionality needed in PyTACO bridge.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D120154
This op is added to allow MLIR code running on multi-GPU systems to
select the GPU they want to execute operations on when no GPU is
otherwise specified.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D119883
While testing LLVM 14.0.0 rc1 on Solaris, I ran into a compile failure:
from /var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/lib/ExecutionEngine/SparseTensorUtils.cpp:22:
/usr/include/sys/types.h:103:16: error: conflicting declaration ‘typedef short int index_t’
103 | typedef short index_t;
| ^~~~~~~
In file included from
/var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/lib/ExecutionEngine/SparseTensorUtils.cpp:17:
/var/llvm/llvm-14.0.0-rc1/rc1/llvm-project/mlir/include/mlir/ExecutionEngine/SparseTensorUtils.h:26:7:
note: previous declaration as ‘using index_t = uint64_t’
26 | using index_t = uint64_t;
| ^~~~~~~
The same issue had already occured in the past and fixed in D72619
<https://reviews.llvm.org/D72619>. More detailed explanation can also be
found there.
Tested on `amd64-pc-solaris2.11` and `sparcv9-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D119323
Rationale:
Although file I/O is a bit alien to MLIR itself, we provide two convenient ways
for sparse tensor I/O. The input part was already there (behind the swiss army
knife sparse_tensor.new). Now we have a sparse_tensor.out to write out data. As
before, the ops are kept vague and may change in the future. For now this
allows us to compare TACO vs MLIR very easily.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D117850