llvm-project

Author	SHA1	Message	Date
Alex Zinenko	6403e1b12a	[mlir] add a dynamic user-after-parent-freed transform dialect check In the transform dialect, a transform IR handle may be pointing to a payload IR operation that is an ancestor of another payload IR operation pointed to by another handle. If such a "parent" handle is consumed by a transformation, this indicates that the associated operation is likely rewritten, which in turn means that the "child" handle may now be associated with a dangling pointer or a pointer to a different operation than originally. Add a handle invalidation mechanism to guard against such situations by reporting errors at runtime. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127480	2022-06-10 13:05:34 +02:00
Matthias Springer	79f115911e	[mlir][bufferize] Avoid tensor copies when the data is not read There are various shortcuts in `BufferizationState::getBuffer` that avoid a buffer copy when we just need an allocation (and no initialization). This change adds those shortcuts to the TensorCopyInsertion pass, so that `getBuffer` can be simplified in a subsequent change. Differential Revision: https://reviews.llvm.org/D126821	2022-06-10 10:26:07 +02:00
Okwan Kwon	5ccb9df3ba	[mlir] Support passing ostream as argument for the create function. The constructor already supports passing an ostream as argument, so let's make the create function support it too. Differential Revision: https://reviews.llvm.org/D127449	2022-06-09 16:34:22 -07:00
Mogball	a31ff0af9b	[mlir][spirv] Replace StructAttrs with AttrDefs Depends on D127370 Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D127373	2022-06-09 23:16:44 +00:00
Mogball	f1182bd6d5	[mlir][tosa] Replace StructAttrs with AttrDefs Depends on D127352 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127370	2022-06-09 23:01:51 +00:00
Mogball	d7ef488bb6	[mlir][gpu] Move GPU headers into IR/ and Transforms/ Depends on D127350 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127352	2022-06-09 22:49:03 +00:00
Mogball	7bdd3722f2	[mlir][gpu] Change ParalellLoopMappingAttr to AttrDef It was a StructAttr. Also adds a FieldParser for AffineMap. Depends on D127348 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127350	2022-06-09 22:23:21 +00:00
Mogball	ba79bb4973	[mlir][nvvm] Change MMAShapeAttr to AttrDef MMAShapeAttr was a StructAttr Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127348	2022-06-09 22:14:45 +00:00
Matthias Springer	87c770bbd0	[mlir][bufferization][NFC] Put inplacability conflict resolution in op interface The TensorCopyInsertion pass resolves out-of-place bufferization decisions by inserting explicit `bufferization.alloc_tensor` ops. This change moves that functionality into a new BufferizableOpInterface method, so that it can be overridden by op implementations. Some op bufferizations must insert additional `alloc_tensor` ops to make sure that certain aliasing invariants are not violated (e.g., scf::ForOp). This will be addressed in a subsequent change. Differential Revision: https://reviews.llvm.org/D126817	2022-06-09 22:06:44 +02:00
Christopher Bate	9f1221521f	Recommit "[mlir][vector] Allow unroll of contraction in arbitrary order" Fixed issue with vector.contract default unroll permutation. Adds support for vector unroll transformations to unroll in different orders. For example, the vector.contract can be unrolled into a smaller set of contractions. There is a choice of how to unroll the decomposition based on the traversal order of (dim0, dim1, dim2). The choice of traversal order can now be specified by a callback which given by the caller of the transform. For now, only the vector.contract, vector.transfer_read/transfer_write operations support the callback. Differential Revision: https://reviews.llvm.org/D127004	2022-06-09 14:01:19 -06:00
Matthias Springer	3b2004e16b	[mlir][bufferization] Add TensorCopyInsertion pass This pass runs the One-Shot Analysis to find out which tensor OpOperands must bufferize out-of-place. It then rewrites those tensor OpOperands to explicit allocations with a copy in the form of `bufferization.alloc_tensor`. The resulting IR can then be bufferized without having to care about read-after-write conflicts. This change makes it possible to connect One-Shot Analysis to other bufferizations such as the sparse compiler. Differential Revision: https://reviews.llvm.org/D126573	2022-06-09 21:55:52 +02:00
Matthias Springer	56d68e8d7a	[mlir][bufferization] Add optional `copy` operand to AllocTensorOp If `copy` is specified, the newly allocated buffer is initialized with the given contents. Also add an optional `escape` attribute to indicate whether the buffer of the tensor may be returned from the parent block (aka. "escape") after bufferization. This change is in preparation of connecting One-Shot Bufferize to the sparse compiler. Differential Revision: https://reviews.llvm.org/D126570	2022-06-09 21:37:15 +02:00
Matthias Springer	88539c5bdb	[mlir][bufferize][NFC] Decouple dropping of equivalent return values from bufferization This simplifies the bufferization itself and is in preparation of connecting with the sparse compiler. Differential Revision: https://reviews.llvm.org/D126814	2022-06-09 18:39:05 +02:00
Matthias Springer	92680126bf	[mlir][bufferize] Decouple promoteBufferResultsToOutParams from One-Shot Bufferize Users should explicitly run `-buffer-results-to-out-params` instead. The purpose of this change is to remove `finalizeBuffers`, which made it difficult to extend the bufferization to custom buffer types. Differential Revision: https://reviews.llvm.org/D126253	2022-06-09 18:25:26 +02:00
Matthias Springer	058af65e78	[mlir][bufferization] Decouple buffer-deallocation from One-Shot Bufferize The buffer deallocation pass must now be run explicitly when `allow-return-alloc` is set. This results in a few extra buffer copies in unoptimized test cases. The proper way to avoid such copies is to relax the OpOperand/OpResult aliasing contract on ops such as scf.for. Some of these copies can also be avoided by improving the buffer deallocation pass. Differential Revision: https://reviews.llvm.org/D126252	2022-06-09 18:20:39 +02:00
Yuanqiang Liu	56e19717f5	[MLIR][Shape] Generalize `shape.concat` to extent tensors The operation `shape.concat` was used for type shape only. We now enable it for extent tensors. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D127321	2022-06-09 08:23:26 -07:00
Matthias Springer	461dafd2a3	[mlir][bufferization] Add OneShotBufferize transform op This commit allows for One-Shot Bufferize to be used through the transform dialect. No op handle is currently returned for the bufferized IR. Differential Revision: https://reviews.llvm.org/D125098	2022-06-09 15:15:09 +02:00
Alex Zinenko	b6c58ec486	[mlir] add producer fusion to structured transform ops This relies on the existing TileAndFuse pattern for tensor-based structured ops. It complements pure tiling, from which some utilities are generalized. Depends On D127300 Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127319	2022-06-09 14:30:45 +02:00
Alex Zinenko	5f0d4f208e	[mlir] Introduce Transform ops for loops Introduce transform ops for "for" loops, in particular for peeling, software pipelining and unrolling, along with a couple of "IR navigation" ops. These ops are intended to be generalized to different kinds of loops when possible and therefore use the "loop" prefix. They currently live in the SCF dialect as there is no clear place to put transform ops that may span across several dialects, this decision is postponed until the ops actually need to handle non-SCF loops. Additionally refactor some common utilities for transform ops into trait or interface methods, and change the loop pipelining to be a returning pattern. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127300	2022-06-09 11:41:55 +02:00
Mogball	971e13d69e	[mlir][ods] Mark StructAttr as deprecated	2022-06-09 03:23:31 +00:00
Arjun P	4bf9cbc408	[MLIR][Presburger] subtract: improve redundant constraint detection When constraints in the two operands make each other redundant, prefer constraints of the second because this affects the number of sets in the output at each level; reducing these can help prevent exponential blowup. This is accomplished by adding extra overloads to Simplex::detectRedundant that only scan a subrange of the constraints for redundancy. Reviewed By: Groverkss Differential Revision: https://reviews.llvm.org/D127237	2022-06-08 14:44:31 -04:00
wren romano	0371ddf9ad	[mlir] Refactoring the tablegen Tensor types Reduces repetition in tablegen files for defining various tensor types. In particular the goal is to reduce the repetition when defining new tensor types (e.g., D126994). Reviewed By: aartbik, rriddle Differential Revision: https://reviews.llvm.org/D127039	2022-06-08 11:33:48 -07:00
dime10	4f55ed5a1e	Add Python bindings for the OpaqueType Implement the C-API and Python bindings for the builtin opaque type, which was previously missing. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D127303	2022-06-08 19:51:00 +02:00
Mogball	ee70039ae2	[mlir] Fix handling of some region branch terminator successors When `RegionBranchOpInterface::getSuccessorRegions` is called for anything other than the parent op, it expects the operands of the terminator of the source region to be passed, not the operands of the parent op. This was not always respected. This fixes a bug in integer range inference and ForwardDataFlowSolver and changes `scf.while` to allow narrowing of successors using constant inputs. Fixes #55873 Reviewed By: mehdi_amini, krzysz00 Differential Revision: https://reviews.llvm.org/D127261	2022-06-08 17:17:03 +00:00
bixia1	ea8ed5cbcf	[mlir][sparse] Add F16 and BF16. This is the first PR to add `F16` and `BF16` support to the sparse codegen. There are still problems in supporting these two data types, such as `BF16` is not quite working yet. Add tests cases. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D127010	2022-06-08 09:51:05 -07:00
lorenzo chelini	a0fc94ab61	[MLIR][Math] Add round operation Introduce RoundOp in the math dialect. The operation rounds the operand to the nearest integer value in floating-point format. RoundOp lowers to LLVM intrinsics 'llvm.intr.round' or as a function call to libm (round or roundf). Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D127286	2022-06-08 13:07:39 +02:00
Matthias Springer	032be23309	[mlir][bufferize] Improve buffer writability analysis Find writability conflicts (writes to buffers that are not allowed to be written to) by checking SSA use-def chains. This is better than the current writability analysis, which is too conservative and finds false positives. Differential Revision: https://reviews.llvm.org/D127256	2022-06-08 10:11:52 +02:00
lorenzo chelini	d48479791f	[MLIR][SCF] Improve doc (NFC)	2022-06-08 08:46:36 +02:00
Aart Bik	7482cd6869	[mlir][sparse] updated our sparse dialect doc with some recent changes The `init` and `tensor` ops are renamed (and one moved to another dialect). Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127169	2022-06-07 14:27:57 -07:00
Christopher Bate	53fe155b3f	Revert "[mlir][vector] Allow unroll of contraction in arbitrary order" Reverts commit 1469ebf8382107e0344173f362b690d19e24029d (original commit) Reverts commit a392a39f75af586e3d3cd046a8361939277e067f (build fix for above commit) The commit broke tests in out-of-tree projects, indicating that some logical error was made in the previous change but not covered by current tests.	2022-06-07 14:54:01 -06:00
Kiran Chandramohan	dd32bf9a77	[Flang,MLIR,OpenMP] Fix a few tests that were not converting to LLVM A few OpenMP tests were retaining the FIR operands even after running the LLVM conversion pass. To fix these tests the legality checkes for OpenMP conversion are made stricter to include operands and results. The Flush, Single and Sections operations are added to conversions or legality checks. The RegionLessOpConversion is appropriately renamed to clarify that it works only for operations with Variable operands. The operands of the flush operation are changed to match those of Variable Operands. Fix for an OpenMP issue mentioned in https://github.com/llvm/llvm-project/issues/55210. Reviewed By: shraiysh, peixin, awarzynski Differential Revision: https://reviews.llvm.org/D127092	2022-06-07 09:55:53 +00:00
Alex Zinenko	3326eddcd1	[mlir] fix documentation format in SCF Four leading spaces are interpreted as a code block in markdown. Unless used consistently in ODS op description, they cannot be stripped away by the tablegen backend, which results in malformed markdown being generated.	2022-06-07 11:51:24 +02:00
lorenzo chelini	9b3712e0bf	[MLIR][LLVMIR] Add round intrinsic Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126879	2022-06-07 10:27:55 +02:00
lewuathe	62a34f6a6f	[mlir][complex] Add complex.conj op Add complex.conj op to calculate the complex conjugate which is widely used for the mathematical operation on the complex space. Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D127181	2022-06-07 09:38:35 +02:00
Georgios Pinitas	3bcaf2eb93	[mlir][tosa] Moves constant folding operations out of the Canonicalizer Transpose operations on constant data were getting folded during the canonicalization process. This has compile time cost proportional to the constant size. Moving this to a separate pass to enable optionality and flexibility of how such scenarios can be handled. Reviewed By: rsuderman, jpienaar, stellaraccident Differential Revision: https://reviews.llvm.org/D124685	2022-06-06 22:10:22 +00:00
Christopher Bate	1469ebf838	[mlir][vector] Allow unroll of contraction in arbitrary order Adds supprot for vector unroll transformations to unroll in different orders. For example, the `vector.contract` can be unrolled into a smaller set of contractions. There is a choice of how to unroll the decomposition based on the traversal order of (dim0, dim1, dim2). The choice of traversal order can now be specified by a callback which given by the caller of the transform. For now, only the `vector.contract`, `vector.transfer_read/transfer_write` operations support the callback. Differential Revision: https://reviews.llvm.org/D127004	2022-06-06 14:31:04 -06:00
Christopher Bate	cca662b849	[mlir][linalg] add conv_2d_nhwc_fhwc named op This operation should be supported as a named op because when the operands are viewed as having canonical layouts with decreasing strides, then the "reduction" dimensions of the filter (h, w, and c) are contiguous relative to each output channel. When lowered to a matrix multiplication, this layout is the simplest to deal with, and thus future transforms/vectorizations of `conv2d` may find using this named op convenient. Differential Revision: https://reviews.llvm.org/D126995	2022-06-06 13:18:08 -06:00
jacquesguan	ad44495ad3	[mlir][NFC] Replace some llvm::find with llvm::is_contained. This patch replaces some llvm::find with llvm::is_contained, it should be more clear. Differential Revision: https://reviews.llvm.org/D127077	2022-06-06 03:01:14 +00:00
Christian Sigg	400fef081a	Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This change rolls bcfc0a9051014437b55ab932d9aca5ecdca6776b forward (i.e., reverting 369ce54bb302f209239b8ebc77ad824add9df089) with fixed CMakeLists.txt.	2022-06-05 09:11:43 +02:00
Mehdi Amini	369ce54bb3	Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This reverts commit bcfc0a9051014437b55ab932d9aca5ecdca6776b. The build is broken with shared library enabled.	2022-06-04 08:35:45 +00:00
Christian Sigg	bcfc0a9051	[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration. This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because: - it performs less Newton iterations - it avoids the slow path for e.g. denormals - it allows reuse of the reciprocal for multiple divisions by the same divisor Test program: ``` #include <stdio.h> #include "cuda_fp16.h" // This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below // and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values. __device__ half hdiv_newton(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float rcp; asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb)); float result = fa * rcp; auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000; if (exponent != 0 && exponent != 0x7f800000) { float err = __fmaf_rn(-fb, result, fa); result = __fmaf_rn(rcp, err, result); } return __float2half(result); } // Surprisingly, this is faster than CUDA's own __hdiv. __device__ half hdiv_promote(half a, half b) { return __float2half(__half2float(a) / __half2float(b)); } // This is an approximation that is accurate up to 1 ulp. __device__ half hdiv_approx(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float result; asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb)); return __float2half(result); } __global__ void CheckCorrectness() { int i = threadIdx.x + blockIdx.x * blockDim.x; half x = reinterpret_cast<const half&>(i); for (int j = 0; j < 65536; ++j) { half y = reinterpret_cast<const half&>(j); half d1 = hdiv_newton(x, y); half d2 = hdiv_promote(x, y); auto s1 = reinterpret_cast<const short&>(d1); auto s2 = reinterpret_cast<const short&>(d2); if (s1 != s2) { printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n", __half2float(x), i, __half2float(y), j, __half2float(d1), s1, __half2float(d2), s2); //__trap(); } } } __device__ half dst; __global__ void ProfileBuiltin(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = x / x; } dst = x; } __global__ void ProfilePromote(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_promote(x, x); } dst = x; } __global__ void ProfileNewton(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_newton(x, x); } dst = x; } __global__ void ProfileApprox(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_approx(x, x); } dst = x; } int main() { CheckCorrectness<<<256, 256>>>(); half one = __float2half(1.0f); ProfileBuiltin<<<1, 1>>>(one); // 1.001s ProfilePromote<<<1, 1>>>(one); // 0.560s ProfileNewton<<<1, 1>>>(one); // 0.508s ProfileApprox<<<1, 1>>>(one); // 0.304s auto status = cudaDeviceSynchronize(); printf("%s\n", cudaGetErrorString(status)); } ``` Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D126158	2022-06-04 08:03:29 +02:00
wren romano	3cf03f1c56	[mlir][sparse] Adding IsSparseTensorPred and updating ops to use it Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D126994	2022-06-03 17:15:31 -07:00
Diego Caballero	9a79b1b04c	[mlir] Add peeling xform to Codegen Strategy This patch adds the knobs to use peeling in the codegen strategy infrastructure. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D126842	2022-06-03 21:31:43 +00:00
Krzysztof Drewniak	95aff23e29	Re-land "[mlir] Add integer range inference analysis"" This reverts commit 4e5ce2056e3e85f109a074e80bdd23a10ca2bed9. This relands commit 1350c9887dca5ba80af8e3c1e61b29d6696eb240. Reinstates the range analysis with the build issue fixed. Differential Revision: https://reviews.llvm.org/D126926	2022-06-03 17:13:48 +00:00
Nicolas Vasilache	72de7588cc	[mlir][SCF] Add bufferization hook for scf.foreach_thread and terminator. `scf.foreach_thread` results alias with the underlying `scf.foreach_thread.parallel_insert_slice` destination operands and they bufferize to equivalent buffers in the absence of other conflicts. `scf.foreach_thread.parallel_insert_slice` conflict detection is similar to `tensor.insert_slice` conflict detection. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D126769	2022-06-03 07:14:05 +00:00
Thomas Raoux	205c08b54d	[mlir][scf] Add option to loop pipelining to not peel the epilogue Add an option to predicate the epilogue within the kernel instead of peeling the epilogue. This is a useful option to prevent generating large amount of code for deep pipeline. This currently require a user lamdba to implement operation predication. Differential Revision: https://reviews.llvm.org/D126753	2022-06-03 04:20:20 +00:00
River Riddle	ee1cf1f645	[mlir][NFC] Simplify the various `parseSourceFile<T>` overloads These effectively all share the same implementation, i.e. forward to the non-templated overload and then construct the container op.	2022-06-02 19:18:55 -07:00
River Riddle	bf352e0b2e	[mlir:PDLL] Add better support for providing Constraint/Pattern/Rewrite documentation This commit enables providing long-form documentation more seamlessly to the LSP by revamping decl documentation. For ODS imported constructs, we now also import descriptions and attach them to decls when possible. For PDLL constructs, the LSP will now try to provide documentation by parsing the comments directly above the decls location within the source file. This commit also adds a new parser flag `enableDocumentation` that gates the import and attachment of ODS documentation, which is unnecessary in the normal build process (i.e. it should only be used/consumed by tools). Differential Revision: https://reviews.llvm.org/D124881	2022-06-02 16:31:07 -07:00
Arjun P	8bc2cff95a	[MLIR][Presburger] Simplex: remove redundant member vars nRow, nCol Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126790	2022-06-03 00:30:48 +01:00
Mehdi Amini	4e5ce2056e	Revert "[mlir] Add integer range inference analysis" This reverts commit 1350c9887dca5ba80af8e3c1e61b29d6696eb240. Shared library build is broken with undefined references.	2022-06-02 21:24:06 +00:00

1 2 3 4 5 ...

6210 Commits