llvm-project

Author	SHA1	Message	Date
Benjamin Maxwell	c42512436b	[mlir][ArmSME] Rename slice move operations to insert/extract_tile_slice (#106755 ) This renames: - `arm_sme.move_tile_slice_to_vector` to `arm_sme.extract_tile_slice` - `arm_sme.move_vector_to_tile_slice` to `arm_sme.insert_tile_slice` The new names are more consistent with the rest of MLIR and should be easier to understand. The current names (to me personally) are hard to parse and easy to mix up when skimming through code. Additionally, the syntax for `insert_tile_slice` has changed from: ```mlir %4 = arm_sme.insert_tile_slice %0, %1, %2 : vector<[16]xi8> into vector<[16]x[16]xi8> ``` To: ```mlir %4 = arm_sme.insert_tile_slice %0, %1[%2] : vector<[16]xi8> into vector<[16]x[16]xi8> ``` This is for consistency with `extract_tile_slice`, but also helps with readability as it makes it clear which operand is the index.	2024-09-02 11:12:40 +01:00
Benjamin Maxwell	e2296d8295	[mlir][ArmSME] Lower extract from 2D scalable create_mask to psel (#96066 ) Example: ```mlir %mask = vector.create_mask %a, %b : vector<[4]x[8]xi1> %slice = vector.extract %mask[%index] : vector<[8]xi1> from vector<[4]x[8]xi1> ``` Becomes: ```mlir %mask_rows = vector.create_mask %a : vector<[4]xi1> %mask_cols = vector.create_mask %b : vector<[8]xi1> %slice = arm_sve.psel %mask_cols, %mask_rows[%index] : vector<[8]xi1>, vector<[4]xi1> ``` Note: While psel is under ArmSVE it requires SME (or SVE 2.1), so this is currently the most logical place for this lowering.	2024-06-20 10:27:07 +01:00
Benjamin Maxwell	4d6b9921b3	[mlir][ArmSME] Fold MoveTileSliceToVector + TransferWrite to StoreTileSlice (#95907 )	2024-06-19 12:52:53 +01:00
Cullen Rhodes	b49c0b8abc	[mlir][ArmSME] Simplify permutation map handling (#93515 ) In -convert-vector-to-arm-sme the permutation_map is explicitly checked for transpose when converting xfer ops, but for 2-D vector types the only non-identity permutation map is transpose so this can be simplified.	2024-05-30 13:22:43 +01:00
Cullen Rhodes	bfb5fe218e	[mlir][ArmSME] Fold transpose into xfer read to enable in-flight transpose (#92562 ) vector.transpose ops whose inputs come from vector.transfer_read can be eliminated by folding the transpose into the xfer op to enable in-flight transposition when converting xfer read to arm_sme.tile_load.	2024-05-21 08:08:05 +01:00
Cullen Rhodes	9f7fff7f13	[mlir][ArmSME] Add arith-to-arm-sme conversion pass (#78197 ) Existing 'arith::ConstantOp' conversion and tests are moved from VectorToArmSME. There's currently only a single op that's converted at the moment, but this will grow in the future as things like in-tile add are implemented. Also, 'createLoopOverTileSlices' is moved to ArmSME utils since it's relevant for both conversions.	2024-01-22 09:23:11 +00:00
Cullen Rhodes	e7432babaf	[mlir][ArmSME] Fail instead of error in vector.outerproduct lowering (#75447 ) The 'vector.outerproduct' -> 'arm_sme.outerproduct' conversion currently errors on unsupported cases when it should return failure.	2023-12-15 07:30:32 +00:00
Benjamin Maxwell	b0b69fd879	[mlir][ArmSME] More precisely model dataflow in ArmSME to SCF lowerings (#73922 ) Since #73253, loops over tiles in SSA form (i.e. loops that take `iter_args` and yield a new tile) are supported, so this patch updates ArmSME lowerings to this form. This is a NFC, as it still lowers to the same intrinsics, but this makes IR less 'surprising' at a higher-level, and may be recognised by more transforms. Example: IR before: ```mlir scf.for %tile_slice_index = %c0 to %num_tile_slices step %c1 { arm_sme.move_vector_to_tile_slice %broadcast_to_1d, %tile, %tile_slice_index : vector<[4]xi32> into vector<[4]x[4]xi32> } // ... later use %tile ``` IR now: ```mlir %broadcast_to_tile = scf.for %tile_slice_index = %c0 to %num_tile_slices step %c1 iter_args(%iter_tile = %init_tile) -> (vector<[4]x[4]xi32>) { %tile_update = arm_sme.move_vector_to_tile_slice %broadcast_to_1d, %iter_tile, %tile_slice_index : vector<[4]xi32> into vector<[4]x[4]xi32> scf.yield %tile_update : vector<[4]x[4]xi32> } // ... later use %broadcast_to_tile ```	2023-12-06 14:31:05 +00:00
Benjamin Maxwell	10063c5a29	[mlir][ArmSME] Move vector.print -> ArmSME lowering to VectorToArmSME (#74063 ) This moves the SME tile vector.print lowering from `-convert-arm-sme-to-scf` to `-convert-vector-to-arm-sme`. This seems like a more logical place, as this is lowering a vector op to ArmSME, and it also prevents vector.print from blocking tile allocation.	2023-12-04 09:42:11 +00:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00
Benjamin Maxwell	c4c52d4199	[mlir][ArmSME] Move vector.extract/insert lowerings to vector-to-arm-sme (NFC) (#72852 ) These were placed in LegalizeForLLVMExport.cpp, which is the wrong stage for these, as these lower to high-level ArmSME ops, not intrinsics.	2023-11-20 14:04:59 +00:00
Matthias Springer	32c3decb77	[mlir][vector] Modernize `vector.transpose` op (#72594 ) * Declare arguments/results with `let` statements. * Rename `transp` to `permutation`. * Change type of `transp` from `I64ArrayAttr` to `DenseI64ArrayAttr` (provides direct access to `ArrayRef<int64_t>` instead of `ArrayAttr`).	2023-11-20 11:25:35 +01:00
Cullen Rhodes	4240b1790f	[mlir][ArmSME] Lower transfer_write + transpose to vertical store (#71181 ) This patch extends the lowering of vector.transfer_write in VectorToArmSME to support in-flight transpose via SME vertical store.	2023-11-10 07:51:06 +00:00
Cullen Rhodes	22f1159223	[mlir][ArmSME] Propagate pad and mask in vector.transfer_read lowering (#70814 ) This extends the lowering of vector.transfer_read -> arm_sme.tile_load lowering to propagate pad and mask. The restriction on the transfer_read being a transposition is also removed, identity maps are lowered to normal horizontal loads.	2023-11-02 10:23:38 +00:00
Cullen Rhodes	1908f47a9b	[mlir][ArmSME] Add optional mask operand to tile_store (#70657 )	2023-11-01 07:15:51 +00:00
Benjamin Maxwell	e666295011	[mlir][ArmSME] Support lowering masked vector.outerproduct ops to SME (#69604 ) This patch adds support for lowering masked outer products to SME. This is done in two stages. First, vector.outerproducts (both masked and non-masked) are rewritten to arm_sme.outerproducts. The arm_sme.outerproduct op is close to vector.outerproduct, but supports masking on the operands rather than the result. It also limits the cases it handles to things that could be (directly) lowered to SME. This currently requires that the source of the mask is a vector.create_mask op. E.g.: ```mlir %mask = vector.create_mask %dimA, %dimB : vector<[4]x[4]xi1> %result = vector.mask %mask { vector.outerproduct %vecA, %vecB : vector<[4]xf32>, vector<[4]xf32> } : vector<[4]x[4]xi1> -> vector<[4]x[4]xf32> ``` Is rewritten to: ``` %maskA = vector.create_mask %dimA : vector<[4]xi1> %maskB = vector.create_mask %dimB : vector<[4]xi1> %result = arm_sme.outerproduct %vecA, %vecB masks(%maskA, %maskB) : vector<[4]xf32>, vector<[4]xf32> ``` (The same rewrite works for non-masked vector.outerproducts too) The arm_sme.outerproduct can then be directly lowered to SME intrinsics.	2023-10-31 09:06:21 +00:00
Cullen Rhodes	d86047cb66	[mlir][ArmSME] Update tile slice layout syntax (#69151 ) This patch prefixes tile slice layout with `layout` in the assemblyFormat: - `<vertical>` -> `layout<vertical>` - `<horizontal>` -> `layout<horizontal>` The reason for this change is the current format doesn't play nicely with additional optional operands, required to support padding and masking (#69148), as it becomes ambiguous. This affects the the following ops: - arm_sme.tile_load - arm_sme.tile_store - arm_sme.load_tile_slice - arm_sme.store_tile_slice	2023-10-16 10:55:30 +01:00
Andrzej Warzynski	23b5f92c97	[mlir][SME] Re-order patterns alphabetically (nfc)	2023-09-29 16:54:47 +00:00
Andrzej Warzyński	35dd3a6475	[mlir][SME][nfc] Clarify the usage of insertion guard (#67668 ) Added extra comment that should clarify the need for an insertion guard when using `getLoopOverTileSlices`. Also removed some redundant calls to `setInsertionPointAfter` - the insertion guard would overwrite that on destruction anyway.	2023-09-29 17:33:57 +01:00
Goran Flegar	042468bff5	[mlir][SME] Fix unused variable warning	2023-09-28 15:24:27 +02:00
Andrzej Warzyński	0cb0df41d4	[mlir][SME] Add vector.splat -> SME conversion (#67659 ) This conversion is identical to vector.broadcast when broadcasting a scalar.	2023-09-28 13:46:24 +01:00
Cullen Rhodes	8e64e9c365	[mlir][ArmSME] Add support for vector.transfer_read with transpose (#67527 ) This patch adds support for lowering a vector.transfer_read with a transpose permutation map to a vertical tile load, for example: vector.transfer_read ... permutation_map: (d0, d1) -> (d1, d0) is converted to: arm_sme.tile_load ... <vertical> On SME the transpose can be done in-flight, rather than as a separate operation as in the TransferReadPermutationLowering, which would do the following: %0 = vector.transfer_read ... vector.transpose %0, [1, 0] ... The lowering doesn't support masking yet and the transfer_read must be in-bounds. It also intentionally doesn't handle simple loads as transfer_write currently does, as the generic TransferReadToVectorLoadLowering can lower these to simple vector.load ops, which can already be lowered to ArmSME. A subsequent patch will update the existing transfer_write lowering, this is a separate patch as there is currently no lowering for vector.transfer_read.	2023-09-28 10:54:20 +01:00
Cullen Rhodes	eaf15900ff	[mlir][ArmSME] Add support for vector.transpose (#66760 ) This patch adds support for lowering vector.transpose to ArmSME. It's implemented by storing the input tile of the tranpose to memory and reloading vertically, building on top of the tile slice layout support. Tranposing via memory is obviously expensive, the current intention is to avoid the transpose if possible, this is therefore intended as a fallback and to provide base support for Vector ops. If it turns out transposes can't be avoided then this should be replaced with a more optimal implementation, perhaps with tile <-> vector (MOVA) ops. Depends on https://github.com/llvm/llvm-project/pull/66758.	2023-09-25 12:15:12 +01:00
Cullen Rhodes	2dd3f42083	[mlir][ArmSME] Lower vector.broadcast to ArmSME This adds support for lowering vector.broadcast ops to SME, if the source is either a scalar, 0-d vector, or 1-d vector, and the result a 2-d scalable vector that aligns with SME tiles. This follows on from D157005 which introduced a vector to tile slice op that moves a 1-d scalable vector to a slice of a 2-d scalable vector (tile). The lowering from vector.broadcast is similar, a couple of helper functions are added to prevent duplication. Lowering of vector.broadcast contributes towards a path from linalg.fill to SME. Depends on D157005 Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D158586	2023-08-29 09:43:16 +00:00
Cullen Rhodes	3b4b6cbba5	[mlir][ArmSME] Add move vector to tile slice op and lowerings This adds a 'move_vector_to_tile_slice' op to the ArmSME dialect that moves a 1-D scalable vector to a slice of a 2-D tile at a given index. This is lowered to the 'llvm.aarch64.sme.write.horiz' intrinsic that maps to the MOVA (vector to tile, single) SME instruction [1] when lowering to LLVM. Like the SME load and store instructions this operates on ZA tile slices, which are 1D vectors of horizontally or vertically contiguous elements within a ZA tile. This patch extends the lowering of 'arith.constant' to SME to support non-zero constants using this new op. This requires materializing a loop that broadcasts the constant to each tile slice with the 'vector_to_tile_slice' op. Unlike load and store, this is done during conversion from Vector to ArmSME, rather than ArmSME to SCF. The latter would require a higher-level custom op in the ArmSME dialect like 'tile_load' and 'tile_store' and this isn't necessary. We may also remove the load and store ops in the future in favour of lowering straight from Vector, at which point this would converge. Currently only horizontal tile slices are supported. A future patch will extend this mechanism to support 'vector.broadcast'. Depends on D156980 D157004 [1] https://developer.arm.com/documentation/ddi0602 Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D157005	2023-08-29 09:29:22 +00:00
Cullen Rhodes	dfa10ec2e6	[mlir][ArmSME] Extend arm_sme.zero for all types The arm_sme.zero op currently only supports 8-bit element tiles. This extends the op and lowering from 'arith.constant dense<0>' -> 'arm_sme.zero' to support all tile types. The lowering from arm_sme.zero to intrinsics is not updated as part of this patch and will be done separately. Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D156980	2023-08-11 12:44:56 +00:00
Cullen Rhodes	12e1a9b876	[mlir][ArmSME] Extend vector.transfer_write lowering Enables the lowering of other tile types and values to match the vector.store -> arm_sme.tile_store lowering. Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D156976	2023-08-11 12:33:09 +00:00
Cullen Rhodes	781883ea62	[mlir][ArmSME] Split lowering of arith.constant from vector.transfer_write An 'arith.constant dense<0>' is currently lowered to 'arm_sme.zero' as part of the 'vector.transfer_write' lowering during '-vector-to-arm-sme' conversion. This patch makes this lowering independent of the 'vector.transfer_write'. This can then be extended for further tile types and non-zero constants. Reviewed By: awarzynski Differential Revision: https://reviews.llvm.org/D156802	2023-08-03 08:57:33 +00:00
Cullen Rhodes	ca9a3354d0	[mlir][ArmSME] Add tile load op and extend tile store tile size support This extends the existing 'arm_sme.tile_store' op to support all tile sizes and adds a new op 'arm_sme.tile_load', as well as lowerings from vector -> custom ops and custom ops -> intrinsics. Currently there's no lowering for i128. Depends on D154867 Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D155306	2023-07-25 08:28:36 +00:00
Andrzej Warzynski	447bb5bee4	[mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME) At the moment, the lowering from the Vector dialect to SME looks like this: * Vector --> SME LLVM IR intrinsics This patch introduces a new lowering layer between the Vector dialect and the Arm SME extension: * Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics. This is motivated by 2 considerations: 1. Storing `ZA` to memory (e.g. `vector.transfer_write`) requires an `scf.for` loop over all rows of `ZA`. Similar logic will apply to "load to ZA from memory". This is a rather complex transformation and a custom Op seems justified. 2. As discussed in [1], we need to prevent the LLVM type converter from having to convert types unsupported in LLVM, e.g. `vector<[16]x[16]xi8>`. A dedicated abstraction layer with custom Ops opens a path to some fine tuning (e.g. custom type converters) that will allow us to avoid this. To facilitate this change, two new custom SME Op are introduced: * `TileStoreOp`, and * `ZeroOp`. Note that no new functionality is added - these Ops merely model what's already supported. In particular, the following tile size is assumed (dimension and element size are fixed): * `vector<[16]x[16]xi8>` The new lowering layer is introduced via a conversion pass between the Vector and the SME dialects. You can use the `-convert-vector-to-sme` flag to run it. The following function: ``` func.func @example(%arg0 : memref<?x?xi8>) { // (...) %cst = arith.constant dense<0> : vector<[16]x[16]xi8> vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8> return } ``` would be lowered to: ``` func.func @example(%arg0: memref<?x?xi8>) { // (...) %0 = arm_sme.zero : vector<[16]x[16]xi8> arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8> return } ``` Later, a mechanism will be introduced to guarantee that `arm_sme.zero` and `arm_sme.tile_store` operate on the same virtual tile. For `i8` elements this is not required as there is only one tile. In order to lower the above output to LLVM, use * `-convert-vector-to-llvm="enable-arm-sme"`. [1] https://github.com/openxla/iree/issues/14294 Reviewed By: WanderAway Differential Revision: https://reviews.llvm.org/D154867	2023-07-18 08:04:59 +00:00

30 Commits