llvm-project

Author	SHA1	Message	Date
Alexander Belyaev	f6fb0a4f35	[mlir] Make patterns for folding tensor.empty optional. At the moment, they are a part of EmptyOp::getCanonicalizationPatterns. When extract_slice(tensor.empty) is rewritten as a new tensor.empty, it could happen that we end up with two tensor.empty ops, since the original tensor.empty can have two users. After bufferization such cases result in two allocations. Differential Revision: https://reviews.llvm.org/D139308	2022-12-07 23:01:34 +01:00
Matthias Springer	50a2bb95ab	[mlir][tensor] Fold rank-reducing extract_slice with inverse expand_shape Differential Revision: https://reviews.llvm.org/D139220	2022-12-05 09:17:24 +01:00
Matthias Springer	f92c7506e3	Revert "[mlir][tensor] Fold rank-reducing extract_slice with inverse expand_shape" This reverts commit a076f57a1a6b6d775aa4f11ac678d1c43ab33fb1.	2022-12-02 21:22:20 +01:00
Matthias Springer	a076f57a1a	[mlir][tensor] Fold rank-reducing extract_slice with inverse expand_shape Differential Revision: https://reviews.llvm.org/D139103	2022-12-02 10:42:46 +01:00
Christian Sigg	be065c41d8	[mlir] Change scf::LoopNest to store 'results'. This fixes the case where scf::LoopNest::loops is empty. Change LoopVector and ValueVector to SmallVector. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D136926	2022-12-01 06:51:45 +01:00
Guray Ozen	6663f34704	[mlir] Introduce device mapper attribute for `thread_dim_map` and `mapped to dims` `scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it. ``` scf.foreach_thread (%i, %j) in (%c1, %c2) { scf.foreach_thread (%i2, %j2) in (%c1, %c2) {...} { thread_dim_mapping = [0, 1]} } { thread_dim_mapping = [0, 1]} ``` It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation. The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads. ``` scf.foreach_thread (%i, %j) in (%c1, %c2) { scf.foreach_thread (%i2, %j2) in (%c1, %c2) {...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]} } { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]} ``` Reviewed By: ftynse, nicolasvasilache Differential Revision: https://reviews.llvm.org/D137413	2022-11-11 08:44:57 +01:00
Christopher Bate	446981bdb6	[mlir][tensor] ExtractSliceFromReshape: handle collapsing of unit dim edge cases Prior to this change, the "ExtractSliceFromReshape" pattern would transform ``` %collapsed = tensor.collapse_shape %input [[0, 1], [2]] : tensor<1x11x100xf32> into tensor<11x100xf32> %slice = tensor.extract_slice %collapsed [%offt, 0] [%size, 100] [1, 1] : tensor<11x100xf32> to tensor<?x100xf32> ``` into a loop that iterated over the range `%size - %offt`, that pieces together multiple sub-slices of `%input` along the first dimension. This is correct but obviously inefficient. The technical condition is that collapsing at-most-one non-unit dimension of `%src` will not result in a subsequent slice along the corresponding dimension of `%collapsed` mapping across discontinuities in the index space of `%src`. Thus, the definition of a "linearized dimension" (from the perspective of `tensor.collapse_shape`) is updated to reflect this condition. The transform will now generate ``` %slice = tensor.extract_slice %input [0, %offt, 0][1, %size, 100] [1, 1] : tensor<1x11x100xf32> to tensor<1x?x100xf32> %result = tensor.collapse_shape [[0, 1], [2]] : tensor<1x?x100xf32> to tensor<?x100xf32> ``` which can be further canonicalized. Additional tests are added to check this family of edge cases. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D135726	2022-10-22 13:29:34 -06:00
Matthias Springer	81ca5aa452	[mlir][tensor][NFC] Rename linalg.init_tensor to tensor.empty tensor.empty/linalg.init_tensor produces an uninititalized tensor that can be used as a destination operand for destination-style ops (ops that implement `DestinationStyleOpInterface`). This change makes it possible to implement `TilingInterface` for non-destination-style ops without depending on the Linalg dialect. RFC: https://discourse.llvm.org/t/rfc-add-tensor-from-shape-operation/65101 Differential Revision: https://reviews.llvm.org/D135129	2022-10-04 17:25:35 +09:00
Jakub Kuderski	abc362a107	[mlir][arith] Change dialect name from Arithmetic to Arith Suggested by @lattner in https://discourse.llvm.org/t/rfc-define-precise-arith-semantics/65507/22. Tested with: `ninja check-mlir check-mlir-integration check-mlir-mlir-spirv-cpu-runner check-mlir-mlir-vulkan-runner check-mlir-examples` and `bazel build --config=generic_clang @llvm-project//mlir:all`. Reviewed By: lattner, Mogball, rriddle, jpienaar, mehdi_amini Differential Revision: https://reviews.llvm.org/D134762	2022-09-29 11:23:28 -04:00
Lei Zhang	bb4c53b7ba	[mlir][tensor] Merge consecutive insert_slice/extract_slice ops Consecutive tensor.insert_slice/tensor.extract_slice can be created for the case like tiling convolution and then downsizing 2-D convolutions into 1-D ones. It hinders further transformations. So adding these patterns to clean it up. Given that bufferization is sensitive and have requirements over the IR structure (see https://reviews.llvm.org/D132666), these patterns are put in Transforms/ with separate entry points for explicit collection. Reviewed By: ThomasRaoux, mravishankar Differential Revision: https://reviews.llvm.org/D133871	2022-09-20 19:52:56 -04:00
Christopher Bate	4d27f06f94	[mlir][Tensor] Fix ExtractSliceFromReshape transform edge case The transformation would fail if none of the sliced dimensions were linearized by the producing `tensor.collapse_shape`. This is a trivial edge case but it wasn't correctly tested. Fixes the issue and adds a test. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D134088	2022-09-19 14:02:45 -06:00
Christopher Bate	f4a478cd01	[mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape` This change adds a set of utilities to replace the result of a `tensor.collapse_shape -> tensor.extract_slice` chain with the equivalent result formed by aggregating slices of the `tensor.collapse_shape` source. In general, it is not possible to commute `extract_slice` and `collapse_shape` if linearized dimensions are sliced. The i-th dimension of the `tensor.collapse_shape` result is a "linearized sliced dimension" if: 1) Reassociation indices of tensor.collapse_shape in the i'th position is greater than size 1 (multiple dimensions of the input are collapsed) 2) The i-th dimension is sliced by `tensor.extract_slice`. We can work around this by stitching together the result of `tensor.extract_slice` by iterating over any linearized sliced dimensions. This is equivalent to "tiling" the linearized-and-sliced dimensions of the `tensor.collapse_shape` operation in order to manifest the result tile (the result of the `tensor.extract_slice`). The user of the utilities must provide the mechanism to create the tiling (e.g. a loop). In the tests, it is demonstrated how to apply the utilities using either `scf.for` or `scf.foreach_thread`. The below example illustrates the pattern using `scf.for`: ``` %0 = linalg.generic ... -> tensor<3x7x11x10xf32> %1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32> %2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32> ``` We can construct %2 by generating the following IR: ``` %dest = linalg.init_tensor() : tensor<10x10xf32> %2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> { // Step 1: Map this output idx (%iv) to a multi-index for the input (%3): %linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv) %3:3 = arith.delinearize_index %iv into (3, 7, 11) // Step 2: Extract the slice from the input %4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] : tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32> %5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] : tensor<1x1x1x10xf32> into tensor<1x10xf32> // Step 3: Insert the slice into the destination %6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] : tensor<1x10xf32> into tensor<10x10xf32> scf.yield %6 : tensor<10x10xf32> } ``` The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D129699	2022-09-08 21:58:21 -06:00
Mehdi Amini	0b1aee38bd	Revert "[mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`" This reverts commit 5711957875738c1318f89afd7bf4be388f85a087. A circular dependency is introduced here from Dialect/Utils/ to the ViewLikeInterface, but it already depends on Dialect/Utils. Also this introduces a dependency from lib/Dialect/Tensor to Linalg, which isn't obviously correct from a layering point of view.	2022-09-02 23:34:52 +00:00
Christopher Bate	5711957875	[mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape` This change adds a set of utilities to replace the result of a `tensor.collapse_shape -> tensor.extract_slice` chain with the equivalent result formed by aggregating slices of the `tensor.collapse_shape` source. In general, it is not possible to commute `extract_slice` and `collapse_shape` if linearized dimensions are sliced. The i-th dimension of the `tensor.collapse_shape` result is a "linearized sliced dimension" if: 1) Reassociation indices of tensor.collapse_shape in the i'th position is greater than size 1 (multiple dimensions of the input are collapsed) 2) The i-th dimension is sliced by `tensor.extract_slice`. We can work around this by stitching together the result of `tensor.extract_slice` by iterating over any linearized sliced dimensions. This is equivalent to "tiling" the linearized-and-sliced dimensions of the `tensor.collapse_shape` operation in order to manifest the result tile (the result of the `tensor.extract_slice`). The user of the utilities must provide the mechanism to create the tiling (e.g. a loop). In the tests, it is demonstrated how to apply the utilities using either `scf.for` or `scf.foreach_thread`. The below example illustrates the pattern using `scf.for`: ``` %0 = linalg.generic ... -> tensor<3x7x11x10xf32> %1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32> %2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32> ``` We can construct %2 by generating the following IR: ``` %dest = linalg.init_tensor() : tensor<10x10xf32> %2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> { // Step 1: Map this output idx (%iv) to a multi-index for the input (%3): %linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv) %3:3 = arith.delinearize_index %iv into (3, 7, 11) // Step 2: Extract the slice from the input %4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] : tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32> %5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] : tensor<1x1x1x10xf32> into tensor<1x10xf32> // Step 3: Insert the slice into the destination %6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] : tensor<1x10xf32> into tensor<10x10xf32> scf.yield %6 : tensor<10x10xf32> } ``` The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D129699	2022-09-02 11:29:04 -06:00
Jacques Pienaar	04235d07ad	[mlir] Update flipped accessors (NFC) Follow up with memref flipped and flipping any intermediate changes made.	2022-06-28 13:11:26 -07:00
Alex Zinenko	8b68da2c7d	[mlir] move SCF headers to SCF/{IR,Transforms} respectively This aligns the SCF dialect file layout with the majority of the dialects. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D128049	2022-06-20 10:18:01 +02:00
Mogball	e16d13322b	[mlir] (NFC) Clean up bazel and CMake target names All dialect targets in bazel have been named Dialect and all dialect targets in CMake have been named MLIRDialect.	2022-06-13 16:24:15 +00:00
River Riddle	5e50dd048e	[mlir] Rework the implementation of TypeID This commit restructures how TypeID is implemented to ideally avoid the current problems related to shared libraries. This is done by changing the "implicit" fallback path to use the name of the type, instead of using a static template variable (which breaks shared libraries). The major downside to this is that it adds some additional initialization costs for the implicit path. Given the use of type names for uniqueness in the fallback, we also no longer allow types defined in anonymous namespaces to have an implicit TypeID. To simplify defining an ID for these classes, a new `MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID` macro was added to allow for explicitly defining a TypeID directly on an internal class. To help identify when types are using the fallback, `-debug-only=typeid` can be used to log which types are using implicit ids. This change generally only requires changes to the test passes, which are all defined in anonymous namespaces, and thus can't use the fallback any longer. Differential Revision: https://reviews.llvm.org/D122775	2022-04-04 13:52:26 -07:00
River Riddle	87d6bf3728	[mlir][test] Generalize a bunch of FuncOp based passes to run on any operation/interfaces A lot of test passes are currently anchored on FuncOp, but this dependency is generally just historical. A majority of these test passes can run on any operation, or can operate on a specific interface (FunctionOpInterface/SymbolOpInterface). This allows for greatly reducing the API dependency on FuncOp, which is slated to be moved out of the Builtin dialect. Differential Revision: https://reviews.llvm.org/D121191	2022-03-08 12:25:32 -08:00
Mehdi Amini	e1f389a89f	Apply clang-tidy fixes for readability-simplify-boolean-expr to MLIR (NFC)	2022-03-07 10:41:45 +00:00
Okwan Kwon	4c901bf447	[mlir] Match Arithmetic::ConstantOp and Tensor::ExtractSliceOp. Add a pattern matcher for ExtractSliceOp when its source is a constant. The matching heuristics can be governed by the control function since generating a new constant is not always beneficial. Differential Revision: https://reviews.llvm.org/D119605	2022-02-28 23:09:03 +00:00
Okwan Kwon	4f5eb53e68	Revert "[mlir] Fold Arithmetic::ConstantOp and Tensor::ExtractSliceOp." This reverts commit 3104994104f0c2f274acf5e01eb6cc82e9cca06b.	2022-02-28 19:14:05 +00:00
Okwan Kwon	3104994104	[mlir] Fold Arithmetic::ConstantOp and Tensor::ExtractSliceOp. Fold ExtractSliceOp when the source is a constant.	2022-02-28 17:47:29 +00:00
Lei Zhang	e027c00821	[mlir][tensor] Add a pattern to split tensor.pad ops This commit adds a pattern to wrap a tensor.pad op with an scf.if op to separate the cases where we don't need padding (all pad sizes are actually zeros) and where we indeed need padding. This pattern is meant to handle padding inside tiled loops. Under such cases the padding sizes typically depend on the loop induction variables. Splitting them would allow treating perfect tiles and edge tiles separately. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D117018	2022-02-16 13:43:57 -05:00

24 Commits