llvm-project

Author	SHA1	Message	Date
jacquesguan	bc37077947	[mlir][Vector] Add constant folder for extractelement. This revision adds constant folder for vector.extractelement. Differential Revision: https://reviews.llvm.org/D122886	2022-04-02 11:10:42 +08:00
jacquesguan	262823612d	[mlir][Vector] Add constant folder for insertelement. This revision adds constant folder for vector.insertelement. Differential Revision: https://reviews.llvm.org/D122721	2022-04-02 10:20:19 +08:00
Lei Zhang	a480d75fe4	[mlir][vector] Fold transpose(broadcast(<scalar>)) For such cases, the transpose op can be elided. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D122903	2022-04-01 14:51:36 -04:00
Lei Zhang	57b101bdec	[mlir][vector] Handle scalars in extract_strided_slice(broadcast) For such cases we cannot generate extract_strided_slice ops. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D122902	2022-04-01 12:07:47 -04:00
jacquesguan	01ad70fd1d	[mlir][Vector] Fold ShuffleOp if result is identical to one of source vectors. For example, we could do the following eliminations: fold vector.shuffle V1, V2, [0, 1, 2, 3] : <4xi32>, <2xi32> -> V1 fold vector.shuffle V1, V2, [4, 5] : <4xi32>, <2xi32> -> V2 Differential Revision: https://reviews.llvm.org/D122706	2022-03-31 10:46:13 +08:00
Javier Setoain	a75a46db89	[mlir][Vector] Enable create_mask for scalable vectors The way vector.create_mask is currently lowered is vector-length-dependent, and therefore incompatible with scalable vector types. This patch adds an alternative lowering path for create_mask operations that return a scalable vector mask. Differential Revision: https://reviews.llvm.org/D118248	2022-03-25 10:48:59 +00:00
River Riddle	3655069234	[mlir] Move the Builtin FuncOp to the Func dialect This commit moves FuncOp out of the builtin dialect, and into the Func dialect. This move has been planned in some capacity from the moment we made FuncOp an operation (years ago). This commit handles the functional aspects of the move, but various aspects are left untouched to ease migration: func::FuncOp is re-exported into mlir to reduce the actual API churn, the assembly format still accepts the unqualified `func`. These temporary measures will remain for a little while to simplify migration before being removed. Differential Revision: https://reviews.llvm.org/D121266	2022-03-16 17:07:03 -07:00
Matthias Springer	de5022c7d7	[mlir][vector] Implement unrolling of ReductionOp Differential Revision: https://reviews.llvm.org/D121597	2022-03-15 01:21:24 +09:00
Thomas Raoux	f69175b1e6	[mlir][vector] Add unrolling pattern for multidim_reduce op Implement the vectorLoopUnroll interface for MultiDimReduceOp and add a pattern to do the unrolling following the same interface other vector unroll patterns. Differential Revision: https://reviews.llvm.org/D121263	2022-03-14 15:22:24 +00:00
gysit	7294be2b8e	[mlir][linalg] Replace linalg.fill by OpDSL variant. The revision removes the linalg.fill operation and renames the OpDSL generated linalg.fill_tensor operation to replace it. After the change, all named structured operations are defined via OpDSL and there are no handwritten operations left. A side-effect of the change is that the pretty printed form changes from: ``` %1 = linalg.fill(%cst, %0) : f32, tensor<?x?xf32> -> tensor<?x?xf32> ``` changes to ``` %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x?xf32>) -> tensor<?x?xf32> ``` Additionally, the builder signature now takes input and output value ranges as it is the case for all other OpDSL operations: ``` rewriter.create<linalg::FillOp>(loc, val, output) ``` changes to ``` rewriter.create<linalg::FillOp>(loc, ValueRange{val}, ValueRange{output}) ``` All other changes remain minimal. In particular, the canonicalization patterns are the same and the `value()`, `output()`, and `result()` methods are now implemented by the FillOpInterface. Depends On D120726 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D120728	2022-03-14 10:51:08 +00:00
Diego Caballero	f71f9958b9	[mlir][Vector] Modernize default lowering of vector transpose This patch removes an old recursive implementation to lower vector.transpose to extract/insert operations and replaces it with a iterative approach that leverages newer linearization/delinearization utilities. The patch should be NFC except by the order in which the extract/insert ops are generated. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D121321	2022-03-10 22:33:14 +00:00
Hanhan Wang	1538bd518c	[mlir][Vector] Add patterns to reorder elementwise ops and broadcast/transpose ops. In quantized comutation, there are casting ops around computation ops. Reorder the ops to make reduce-to-contract actually work. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D120760	2022-03-07 12:52:12 -08:00
Diego Caballero	917d95fc8a	[mlir][Vector] Improve default lowering of vector transpose operations The default lowering of vector transpose operations generates a large sequence of scalar extract/insert operations, one pair for each scalar element in the input tensor. In other words, the vector transpose is scalarized. However, there are transpose patterns where one or more adjacent high-order dimensions are not transposed (for example, in the transpose pattern [1, 0, 2, 3], dimensions 2 and 3 are not transposed). This patch improves the lowering of those cases by not scalarizing them and extracting/ inserting a full n-D vector, where 'n' is the number of adjacent high-order dimensions not being transposed. By doing so, we prevent the scalarization of the code and generate a more performant vector version. Paradoxically, this patch shouldn't improve the performance of transpose operations if we are using LLVM. The LLVM pipeline is able to optimize away some of the extract/insert operations and the SLP vectorizer is converting the scalar operations back to its vector form. However, scalarizing a vector version of the code in MLIR and relying on the SLP vectorizer to reconstruct the vector code again is highly undesirable for several reasons. Reviewed By: nicolasvasilache, ThomasRaoux Differential Revision: https://reviews.llvm.org/D120601	2022-03-07 17:56:02 +00:00
Diego Caballero	875bbce9f7	[mlir][Vector] Prevent AVX2 lowering for non-f32 transpose ops The AVX2 lowering for transpose operations is only applicable to f32 vector types. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D120427	2022-02-25 19:27:32 +00:00
Diego Caballero	d7e0a0846b	[mlir][Vector] Generalize AVX2 transpose lowering to n-D vectors The existing AVX2 lowering patterns for the transpose op only triggers if the input vector is 2-D. This patch extends the patterns to trigger for n-D vectors which are effectively 2-D vectors (e.g., vector<1x4x1x8x1). The main constraint for the generalized AVX2 patterns to be applicable to these vectors is that the dimensions that are greater than one must be transposed. Otherwise, the existing patterns are not applicable. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D119505	2022-02-25 19:27:32 +00:00
Benjamin Kramer	d558540fae	[mlir][Vector] Add return type inference for multi_reduction This subsumes the builder and verifier.	2022-02-18 13:00:42 +01:00
Benjamin Kramer	b47be47ac2	[mlir][Vector] Switch ExtractOp to the declarative assembly format This is a bit awkward since ExtractOp allows both `f32` and `vector<1xf32>` results for a scalar extraction. Allow both, but make inference return the scalar to make this as NFC as possible.	2022-02-18 11:45:59 +01:00
Benjamin Kramer	f0dd818be3	[mlir][Vector] Switch ShuffleOp to the declarative assembly format This also requires implementing return type deduction.	2022-02-18 01:46:58 +01:00
Matthias Springer	73e880fbf1	[mlir][bufferize] Add vector-bufferize pass and remove obsolete patterns from Linalg Bufferize Differential Revision: https://reviews.llvm.org/D119444	2022-02-15 21:25:14 +09:00
Nirvedh	ad9b5a4b8e	[mlir][vector] Add pattern to drop lead unit dim for Contraction Op If the result operand has a unit leading dim it is removed from all operands. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D119206	2022-02-10 09:51:07 -08:00
Matthias Springer	fe0bf7d469	[mlir][vector][NFC] Use CombiningKindAttr instead of StringAttr This makes the op consistent with other ops in vector dialect. Differential Revision: https://reviews.llvm.org/D119343	2022-02-10 19:13:29 +09:00
harsh	4a876b13fb	Add case to handle 0-D vectors in FlattenContiguousRowMajorTransferWritePattern and FlattenContiguousRowMajorTransferReadPattern. For 0-D as well as 1-D vectors, both these patterns should return a failure as there is no need to collapse the shape of the source. Currently, only 1-D vectors were handled. This patch handles the 0-D case as well. Reviewed By: Benoit, ThomasRaoux Differential Revision: https://reviews.llvm.org/D119202	2022-02-08 20:00:12 +00:00
Lei Zhang	9dd4c2dcb6	[mlir][vector] Add constant folder for vector.shuffle ops Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D119032	2022-02-04 16:59:32 -05:00
River Riddle	dec8af701f	[mlir] Move SelectOp from Standard to Arithmetic This is part of splitting up the standard dialect. See https://llvm.discourse.group/t/standard-dialect-the-final-chapter/ for discussion. Differential Revision: https://reviews.llvm.org/D118648	2022-02-02 14:45:12 -08:00
River Riddle	6a8ba3186e	[mlir] Split std.splat into tensor.splat and vector.splat This is part of the larger effort to split the standard dialect. This will also allow for pruning some additional dependencies on Standard (done in a followup). Differential Revision: https://reviews.llvm.org/D118202	2022-02-02 14:45:12 -08:00
Nicolas Vasilache	3c3810e72e	[mlir][vector] Avoid hoisting alloca'ed temporary buffers across AutomaticAllocationScope This revision avoids incorrect hoisting of alloca'd buffers across an AutomaticAllocationScope boundary. In the more general case, we will probably need a ParallelScope-like interface. Differential Revision: https://reviews.llvm.org/D118768	2022-02-02 06:00:42 -05:00
gysit	dc82547b17	[mlir][vector] Make write permutation lowering work with tensors. Use type inference when building the TransferWriteOp in the TransferWritePermutationLowering. Previously, the result type has been set to Type() which triggers an assertion if the pattern is used with tensors instead of memrefs. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D118758	2022-02-02 09:21:10 +00:00
Alexander Belyaev	ebc8153786	Revert "Revert "[mlir] Purge `linalg.copy` and use `memref.copy` instead."" This reverts commit 25bf6a2a9bc6ecb3792199490c70c4ce50a94aea.	2022-02-01 18:21:21 +01:00
Alexander Belyaev	25bf6a2a9b	Revert "[mlir] Purge `linalg.copy` and use `memref.copy` instead." This reverts commit 016956b68081705ffee511c334e31e414fa1ddbf. Reverting it to fix NVidia build without being in a hurry.	2022-01-31 18:51:39 +01:00
Alexander Belyaev	016956b680	[mlir] Purge `linalg.copy` and use `memref.copy` instead. Differential Revision: https://reviews.llvm.org/D118028	2022-01-31 18:25:56 +01:00
harsh	80e0bf1af1	Add vector.scan op This patch adds the vector.scan op which computes the scan for a given n-d vector. It requires specifying the operator, the identity element and whether the scan is inclusive or exclusive. TEST: Added test in ops.mlir Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D117171	2022-01-28 20:07:57 +00:00
Sergei Grechanik	5abf116322	[mlir][vector] Allow values outside of [0; dim-size] in create_mask This commits explicitly states that negative values and values exceeding vector dimensions are allowed in vector.create_mask (but not in vector.constant_mask). These values are now truncated when canonicalizing vector.create_mask to vector.constant_mask. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D116069	2022-01-20 09:34:42 -08:00
Nicolas Vasilache	f98025d867	[mlir][Vector] Generalize and improve folding of ExtractOp from Insert/Transpose chain. This revision fixes a bug where the iterative algorithm would walk back def-use chains to an incorrect operand. This exposed opportunities for a larger refactoring and behavior improvement. The new algorithm has improved folding behavior and proceeds by tracking both the permutation of the extraction position and the internal vector permutation. Multiple partial intersection cases with a candidate insertOp are supported. The refactoring of the implementation should also help it generalize to strided insert/extract op. This also subsumes the previous `foldExtractOpFromTranspose` which is now a simple special case and can be deleted. Differential Revision: https://reviews.llvm.org/D117322	2022-01-17 16:05:23 +00:00
Thomas Raoux	b0a309dd7a	[mlir][vector] Add folding for extract + extract/insert_strided Differential Revision: https://reviews.llvm.org/D116785	2022-01-12 11:48:35 -08:00
Nicolas Vasilache	2e69f4f012	[mlir][vector] Fix illegal vector.transfer + tensor.insert/extract_slice folding vector.transfer operations do not have rank-reducing semantics. Bail on illegal rank-reduction: we need to check that the rank-reduced dims are exactly the leading dims. I.e. the following is illegal: ``` %0 = vector.transfer_write %v, %t[0,0], %cst : vector<2x4xf32>, tensor<2x4xf32> %1 = tensor.insert_slice %0 into %tt[0,0,0][2,1,4][1,1,1] : tensor<2x4xf32> into tensor<2x1x4xf32> ``` Cannot fold into: ``` %0 = vector.transfer_write %v, %t[0,0,0], %cst : vector<2x4xf32>, tensor<2x1x4xf32> ``` For this, check the trailing `vectorRank` dims of the insert_slice result tensor match the trailing dims of the inferred result tensor. Differential Revision: https://reviews.llvm.org/D116409	2021-12-30 14:55:16 +00:00
MaheshRavishankar	7df7586a0b	[mlir][MemRef] Deprecate unspecified trailing offset, size, and strides semantics of `OffsetSizeAndStrideOpInterface`. The semantics of the ops that implement the `OffsetSizeAndStrideOpInterface` is that if the number of offsets, sizes or strides are less than the rank of the source, then some default values are filled along the trailing dimensions (0 for offset, source dimension of sizes, and 1 for strides). This is confusing, especially with rank-reducing semantics. Immediate issue here is that the methods of `OffsetSizeAndStridesOpInterface` assumes that the number of values is same as the source rank. This cause out-of-bounds errors. So simplifying the specification of `OffsetSizeAndStridesOpInterface` to make it invalid to specify number of offsets/sizes/strides not equal to the source rank. Differential Revision: https://reviews.llvm.org/D115677	2021-12-29 11:18:29 -08:00
Hanhan Wang	501674dc3b	[mlir][Vector] Further fix to avoid infinite loop in InnerOuterDimReductionConversion If all the dims are reduction dims, it is already in inner-most/outer-most reduction form. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D115820	2021-12-15 13:54:15 -08:00
Javier Setoain	a4830d14ed	[mlir][RFC] Add scalable dimensions to VectorType With VectorType supporting scalable dimensions, we don't need many of the operations currently present in ArmSVE, like mask generation and basic arithmetic instructions. Therefore, this patch also gets rid of those. Having built-in scalable vector support also simplifies the lowering of scalable vector dialects down to LLVMIR. Scalable dimensions are indicated with the scalable dimensions between square brackets: vector<[4]xf32> Is a scalable vector of 4 single precission floating point elements. More generally, a VectorType can have a set of fixed-length dimensions followed by a set of scalable dimensions: vector<2x[4x4]xf32> Is a vector with 2 scalable 4x4 vectors of single precission floating point elements. The scale of the scalable dimensions can be obtained with the Vector operation: %vs = vector.vscale This change is being discussed in the discourse RFC: https://llvm.discourse.group/t/rfc-add-built-in-support-for-scalable-vector-types/4484 Differential Revision: https://reviews.llvm.org/D111819	2021-12-15 09:31:37 +00:00
Benoit Jacob	aba437ceb2	[mlir][Vector] Patterns flattening vector transfers to 1D This is the second part of https://reviews.llvm.org/D114993 after slicing into 2 independent commits. This is needed at the moment to get good codegen from 2d vector.transfer ops that aim to compile to SIMD load/store instructions but that can only do so if the whole 2d transfer shape is handled in one piece, in particular taking advantage of the memref being contiguous rowmajor. For instance, if the target architecture has 128bit SIMD then we would expect that contiguous row-major transfers of <4x4xi8> map to one SIMD load/store instruction each. The current generic lowering of multi-dimensional vector.transfer ops can't achieve that because it peels dimensions one by one, so a transfer of <4x4xi8> becomes 4 transfers of <4xi8>. The new patterns here are only enabled for now by -test-vector-transfer-flatten-patterns. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D114993	2021-12-13 22:39:41 +00:00
Benoit Jacob	0aea49a730	[mlir][Vector] Patterns flattening vector transfers to 1D This is the first part of https://reviews.llvm.org/D114993 which has been split into small independent commits. This is needed at the moment to get good codegen from 2d vector.transfer ops that aim to compile to SIMD load/store instructions but that can only do so if the whole 2d transfer shape is handled in one piece, in particular taking advantage of the memref being contiguous rowmajor. For instance, if the target architecture has 128bit SIMD then we would expect that contiguous row-major transfers of <4x4xi8> map to one SIMD load/store instruction each. The current generic lowering of multi-dimensional vector.transfer ops can't achieve that because it peels dimensions one by one, so a transfer of <4x4xi8> becomes 4 transfers of <4xi8>. The new patterns here are only enabled for now by -test-vector-transfer-flatten-patterns. Reviewed By: nicolasvasilache	2021-12-13 21:49:04 +00:00
Nicolas Vasilache	408553dd96	[mlir][Vector] Support 0-D vectors in `CreateMaskOp` The 0-D case gets lowered in almost the same way that the 1-D case does in VectorCreateMaskOpConversion. I also had to slightly update the verifier for the op to always require exactly 1 operand in the 0-D case. Depends On D115220 Reviewed by: ftynse Differential revision: https://reviews.llvm.org/D115221	2021-12-12 13:32:29 +00:00
MaheshRavishankar	9cfd8d7c6c	[mlir][Vector] Avoid infinite loop in InnerOuterDimReductionConversion. This patterns tries to convert an inner (outer) dim reduction to an outer (inner) dim reduction. Doing this on a 1D or 0D vector results in an infinite loop since the converted op is same as the original operation. Just returning failure when source rank <= 1 fixes the issue. Differential Revision: https://reviews.llvm.org/D115426	2021-12-09 09:30:05 -08:00
Mehdi Amini	ee0908703d	Change the printing/parsing behavior for Attributes used in declarative assembly format The new form of printing attribute in the declarative assembly is eliding the `#dialect.mnemonic` prefix to only keep the `<....>` part. Differential Revision: https://reviews.llvm.org/D113873	2021-12-08 02:02:37 +00:00
Michal Terepeta	caf89c0db6	[mlir][Vector] Support 0-D vectors in `ConstantMaskOp` To support creating both a mask with just a single `true` and `false` values, I had to relax the restriction in the verifier that the rank is always equal to the length of the attribute array, in other words, we now allow: - `vector.constant_mask [0] : vector<i1>` which gets lowered to `arith.constant dense<false> : vector<i1>` - `vector.constant_mask [1] : vector<i1>` which gets lowered to `arith.constant dense<true> : vector<i1>` (the attribute list for the 0-D case must be a singleton containing either `0` or `1`) Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D115023	2021-12-06 08:03:04 +00:00
Michal Terepeta	1423e8bf5d	[mlir][Vector] Support 0-D vectors in `BitCastOp` The implementation only allows to bit-cast between two 0-D vectors. We could probably support casting from/to vectors like `vector<1xf32>`, but I wasn't convinced that this would be important and it would require breaking the invariant that `BitCastOp` works only on vectors with equal rank. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D114854	2021-12-03 08:55:59 +00:00
Michal Terepeta	8e2b373396	[mlir][Vector] Add some missing tests for `broadcast` and `splat` Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D114853	2021-12-03 08:52:51 +00:00
Nicolas Vasilache	c537a94334	[mlir][Vector] Thread 0-d vectors through vector.transfer ops This revision adds 0-d vector support to vector.transfer ops. In the process, numerous cleanups are applied, in particular around normalizing and reducing the number of builders. Reviewed By: ThomasRaoux, springerm Differential Revision: https://reviews.llvm.org/D114803	2021-12-01 16:49:43 +00:00
Nicolas Vasilache	3ff4e5f2a4	[mlir][Vector] Thread 0-d vectors through InsertElementOp. This revision makes concrete use of 0-d vectors to extend the semantics of InsertElementOp. Reviewed By: dcaballe, pifon2a Differential Revision: https://reviews.llvm.org/D114388	2021-11-23 12:55:11 +00:00
Nicolas Vasilache	e7026aba00	[mlir][Vector] Thread 0-d vectors through ExtractElementOp. This revision starts making concrete use of 0-d vectors to extend the semantics of ExtractElementOp. In the process a new VectorOfAnyRank Tablegen OpBase.td is added to allow progressive transition to supporting 0-d vectors by gradually opting in. Differential Revision: https://reviews.llvm.org/D114387	2021-11-23 12:39:44 +00:00
Nicolas Vasilache	b2729fda60	[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm) This revision follows up on the conversation titled: ```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths``` The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation. This results in roughly 20% fewer cycles as reported by llvm-mca: After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted): ``` Iterations: 100 Instructions: 5900 Total Cycles: 2415 Total uOps: 7300 Dispatch Width: 6 uOps Per Cycle: 3.02 IPC: 2.44 Block RThroughput: 24.0 Cycles with backend pressure increase [ 89.90% ] Throughput Bottlenecks: Resource Pressure [ 89.65% ] - SKXPort1 [ 0.04% ] - SKXPort2 [ 12.42% ] - SKXPort3 [ 12.42% ] - SKXPort5 [ 89.52% ] Data Dependencies: [ 37.06% ] - Register Dependencies [ 37.06% ] - Memory Dependencies [ 0.00% ] ``` After this revision (inline_asm version, vblendps instructions are indeed emitted): ``` Iterations: 100 Instructions: 6300 Total Cycles: 2015 Total uOps: 7700 Dispatch Width: 6 uOps Per Cycle: 3.82 IPC: 3.13 Block RThroughput: 20.0 Cycles with backend pressure increase [ 83.47% ] Throughput Bottlenecks: Resource Pressure [ 83.18% ] - SKXPort0 [ 14.49% ] - SKXPort1 [ 14.54% ] - SKXPort2 [ 19.70% ] - SKXPort3 [ 19.70% ] - SKXPort5 [ 83.03% ] - SKXPort6 [ 14.49% ] Data Dependencies: [ 39.75% ] - Register Dependencies [ 39.75% ] - Memory Dependencies [ 0.00% ] ``` An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0). Differential Revision: https://reviews.llvm.org/D114393	2021-11-23 07:31:22 +00:00

1 2 3 4 5

213 Commits