llvm-project

Author	SHA1	Message	Date
Benjamin Maxwell	042800a4dd	[mlir][ArmSME] Add initial SME vector legalization pass (#79152 ) This adds a new pass (`-arm-sme-vector-legalization`) which legalizes vector operations so that they can be lowered to ArmSME. This initial patch adds decomposition for `vector.outerproduct`, `vector.transfer_read`, and `vector.transfer_write` when they operate on vector types larger than a single SME tile. For example, a [8]x[8]xf32 outer product would be decomposed into four [4]x[4]xf32 outer products, which could then be lowered to ArmSME. These three ops have been picked as supporting them alone allows lowering matmuls that use all ZA accumulators to ArmSME. For it to be possible to legalize a vector type it has to be a multiple of an SME tile size, but other than that any shape can be used. E.g. `vector<[8]x[8]xf32>`, `vector<[4]x[16]xf32>`, `vector<[16]x[4]xf32>` can all be lowered to four `vector<[4]x[4]xf32>` operations. In future, this pass will be extended with more SME-specific rewrites to legalize unrolling the reduction dimension of matmuls (which is not type-decomposition), which is why the pass has quite a general name.	2024-01-31 11:55:22 +00:00
Cullen Rhodes	9f7fff7f13	[mlir][ArmSME] Add arith-to-arm-sme conversion pass (#78197 ) Existing 'arith::ConstantOp' conversion and tests are moved from VectorToArmSME. There's currently only a single op that's converted at the moment, but this will grow in the future as things like in-tile add are implemented. Also, 'createLoopOverTileSlices' is moved to ArmSME utils since it's relevant for both conversions.	2024-01-22 09:23:11 +00:00
Benjamin Maxwell	5417a5fed6	[mlir][ArmSME] Add rudimentary support for tile spills to the stack (#76086 ) This adds very basic (and inelegant) support for something like spilling and reloading tiles, if you use more SME tiles than physically exist. This is purely implemented to prevent the compiler from aborting if a function uses too many tiles (i.e. due to bad unrolling), but is expected to perform very poorly. Currently, this works in two stages: During tile allocation, if we run out of tiles instead of giving up, we switch to allocating 'in-memory' tile IDs. These are tile IDs that start at 16 (which is higher than any real tile ID). A warning will also be emitted for each (root) tile op assigned an in-memory tile ID: ``` warning: failed to allocate SME virtual tile to operation, all tile operations will go through memory, expect degraded performance ``` Everything after this works like normal until `-convert-arm-sme-to-llvm` Here the in-memory tile op: ```mlir arm_sme.tile_op { tile_id = <IN MEMORY TILE> } ``` Is lowered to: ```mlir // At function entry: %alloca = memref.alloca ... : memref<?x?xty> // Around the op: // Swap the contents of %alloca and tile 0. scf.for %slice_idx { %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}> "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx) <{tile_id = 0 : i32}> vector.store %current_slice, %alloca[%slice_idx, %c0] } // Execute op using tile 0. arm_sme.tile_op { tile_id = 0 } // Swap the contents of %alloca and tile 0. // This restores tile 0 to its original state. scf.for %slice_idx { %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}> "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx) <{tile_id = 0 : i32}> vector.store %current_slice, %alloca[%slice_idx, %c0] } ``` This is inserted during the lowering to LLVM as spilling/reloading registers is a very low-level concept, that can't really be modeled correctly at a high level in MLIR. Note: This is always doing the worst case full-tile swap. This could be optimized to only spill/load data the tile op will use, which could be just a slice. It's also not making any use of liveness, which could allow reusing tiles. But these is not seen as important as correct code should only use the available number of tiles.	2024-01-12 14:51:47 +00:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00

4 Commits