llvm-project

Author	SHA1	Message	Date
Benjamin Maxwell	fb8eb4251a	[mlir][ArmSME] Fix loop bounds of masked loads/stores (#78983 ) Previously, for masked tile loads/stores we directly used the dimension size from the `vector.create_mask` operation as the upper bound of the `scf.for` over the tile slices. This was not correct, as `create_mask` allows operands to be greater than the size of the vector dimension, in which case the for loop bounds should be clamped to the number of tile slices.	2024-01-26 09:39:43 +00:00
Benjamin Maxwell	8ff16f646f	[mlir][ArmSME] Refactor ArmSMEToSCF to used shared loop-building helper (NFC) (#79172 ) This will make fixing a bug (next patch) a change to one place, rather than fixing three separate rewrites. Note: `TileLoadOpWithMaskAndPadZeroConversion` has been merged into `TileLoadOpConversion`, since after this change those two rewrites were pretty much identical.	2024-01-25 15:11:46 +00:00
Benjamin Maxwell	01ac530a2e	[mlir][ArmSME] Remove `vector.print` legality from ArmSMEToSCF (NFC) (#74875 ) This was moved to VectorToArmSME in #74063, so this is no longer needed. VectorToArmSME uses a greedy rewriter, so a similar legality rule is not needed there. See: `bbb8a0df73/mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp (L35)`	2023-12-11 11:25:43 +00:00
Benjamin Maxwell	b0b69fd879	[mlir][ArmSME] More precisely model dataflow in ArmSME to SCF lowerings (#73922 ) Since #73253, loops over tiles in SSA form (i.e. loops that take `iter_args` and yield a new tile) are supported, so this patch updates ArmSME lowerings to this form. This is a NFC, as it still lowers to the same intrinsics, but this makes IR less 'surprising' at a higher-level, and may be recognised by more transforms. Example: IR before: ```mlir scf.for %tile_slice_index = %c0 to %num_tile_slices step %c1 { arm_sme.move_vector_to_tile_slice %broadcast_to_1d, %tile, %tile_slice_index : vector<[4]xi32> into vector<[4]x[4]xi32> } // ... later use %tile ``` IR now: ```mlir %broadcast_to_tile = scf.for %tile_slice_index = %c0 to %num_tile_slices step %c1 iter_args(%iter_tile = %init_tile) -> (vector<[4]x[4]xi32>) { %tile_update = arm_sme.move_vector_to_tile_slice %broadcast_to_1d, %iter_tile, %tile_slice_index : vector<[4]xi32> into vector<[4]x[4]xi32> scf.yield %tile_update : vector<[4]x[4]xi32> } // ... later use %broadcast_to_tile ```	2023-12-06 14:31:05 +00:00
Benjamin Maxwell	10063c5a29	[mlir][ArmSME] Move vector.print -> ArmSME lowering to VectorToArmSME (#74063 ) This moves the SME tile vector.print lowering from `-convert-arm-sme-to-scf` to `-convert-vector-to-arm-sme`. This seems like a more logical place, as this is lowering a vector op to ArmSME, and it also prevents vector.print from blocking tile allocation.	2023-12-04 09:42:11 +00:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00
Cullen Rhodes	9783cf448a	[mlir][ArmSME] Add support for lowering masked tile_load ops (#70915 ) This patch extends ArmSMEToSCF to support lowering of masked tile_load ops. Only masks created by 'vector.create_mask' are currently supported. There are two lowerings depending on the pad. For pad of constant zero, the tile is first zeroed, then only active rows are loaded. For non-zero pad, the scalar pad is broadcast to a 1-D vector and a regular 'vector.masked_load' (will be lowered to SVE, not SME) loads each slice, with padding specified as a passthru and the 2-D mask combined into a 1-D mask. The resulting slice is then inserted into the tile with 'arm_sme.move_vector_to_tile_slice'.	2023-11-08 09:02:09 +00:00
Cullen Rhodes	ed350bb3d8	[mlir][ArmSME] Add support for lowering masked tile_store ops (#71180 ) This patch extends ArmSMEToSCF to support lowering of masked tile_store ops. Only masks created by 'vector.create_mask' are currently supported. Example: %mask = vector.create_mask %c3, %c2 : vector<[4]x[4]xi1> arm_sme.tile_store %tile, %dest[%c0, %c0], %mask : memref<?x?xi32>, vector<[4]x[4]xi32> Produces: %num_rows = arith.constant 3 : index %num_cols = vector.create_mask %c2 : vector<[4]xi1> scf.for %slice_idx = %c0 to %num_rows step %c1 arm_sme.store_tile_slice %tile, %slice_idx, %num_cols, %dest[%slice_idx, %c0] : memref<?x?xi32>, vector<[4]xi1>, vector<[4]x[4]xi32>	2023-11-06 11:18:57 +00:00
Cullen Rhodes	8f564e014e	[mlir][ArmSME] Add mask operand to store_tile_slice (#70838 )	2023-11-02 08:43:37 +00:00
Cullen Rhodes	8ea260a093	[mlir][ArmSME] Add mask operand to load_tile_slice (#70655 )	2023-10-31 13:08:55 +00:00
Cullen Rhodes	d86047cb66	[mlir][ArmSME] Update tile slice layout syntax (#69151 ) This patch prefixes tile slice layout with `layout` in the assemblyFormat: - `<vertical>` -> `layout<vertical>` - `<horizontal>` -> `layout<horizontal>` The reason for this change is the current format doesn't play nicely with additional optional operands, required to support padding and masking (#69148), as it becomes ambiguous. This affects the the following ops: - arm_sme.tile_load - arm_sme.tile_store - arm_sme.load_tile_slice - arm_sme.store_tile_slice	2023-10-16 10:55:30 +01:00
Benjamin Maxwell	b34f15df55	[mlir][ArmSME] Add arm_sme.move_tile_slice_to_vector op (#67652 ) This adds a simple higher-level op for the tile slice to vector intrinsics (and updates the existing vector.print lowering to use it). This op will be used a few more times to implement vector.insert/extract lowerings in later patches.	2023-09-29 10:33:09 +01:00
Benjamin Maxwell	174cd6145b	[mlir][ArmSME] Add custom vector.print lowering for SME tiles (#66691 ) This adds a custom lowering for SME that loops over each row of the tile, extracting it via an SME MOVA, then printing with a normal 1D vector.print. This makes writing SME integration tests easier and less verbose. Depends on: #66910, #66911	2023-09-26 17:09:57 +01:00
Cullen Rhodes	75a71c27c1	[mlir][ArmSME] Support vertical layout in load and store ops (#66758 ) In SME a ZA tile slice is a one-dimensional set of horizontally or vertically contiguous elements within a ZA tile. Currently the load and store ops only support horizontal tile slices. This patch adds a tile slice layout attribute to the load and store ops to support both horizontal and vertical tile slices. When lowering from Vector dialect horizontal layout is the default.	2023-09-25 09:34:23 +01:00
Cullen Rhodes	65a6be5de9	[mlir][ArmSME] Use memref indices for load and store This patch extends the ArmSME load and store op lowering to use the memref indices. An integration test that loads two 32-bit element ZA tiles from memory and stores them back to memory in reverse order to verify this is added. Depends on D156467 D156558 Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D156689	2023-08-03 08:50:12 +00:00
Cullen Rhodes	9e1b825321	[mlir][ArmSME] Add conversion from ArmSME to SCF to materialize loops Currently a loop is materialized when lowering ArmSME loads and stores to intrinsics. This patch introduces two new ops to the ArmSME dialect that map 1-1 with intrinsics: 1. arm_sme.load_tile_slice - Loads a 1D tile slice from memory into a 2D SME "virtual tile". 2. arm_sme.store_tile_slice - Stores a 1D tile slice from a 2D SME "virtual tile" into memory. As well as a new conversion pass '-convert-arm-sme-to-scf' that materializes loops with these ops. The existing load/store lowering to intrinsics is updated to use these ops. Depends on D156517 Discourse thread: https://discourse.llvm.org/t/loop-materialization-in-armsme/72354 Reviewed By: awarzynski, dcaballe, WanderAway Differential Revision: https://reviews.llvm.org/D156467	2023-08-01 08:20:02 +00:00

16 Commits