llvm-project

Author	SHA1	Message	Date
Frank Schlimbach	d5746d73ce	eliminating g++ warnings (#105520 ) Eliminating g++ warnings. Mostly declaring "[[maybe_unused]]", adding return statements where missing and fixing casts. @rengolin --------- Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech> Co-authored-by: Renato Golin <rengolin@systemcall.eu>	2024-10-18 21:20:47 +01:00
Benjamin Maxwell	c42512436b	[mlir][ArmSME] Rename slice move operations to insert/extract_tile_slice (#106755 ) This renames: - `arm_sme.move_tile_slice_to_vector` to `arm_sme.extract_tile_slice` - `arm_sme.move_vector_to_tile_slice` to `arm_sme.insert_tile_slice` The new names are more consistent with the rest of MLIR and should be easier to understand. The current names (to me personally) are hard to parse and easy to mix up when skimming through code. Additionally, the syntax for `insert_tile_slice` has changed from: ```mlir %4 = arm_sme.insert_tile_slice %0, %1, %2 : vector<[16]xi8> into vector<[16]x[16]xi8> ``` To: ```mlir %4 = arm_sme.insert_tile_slice %0, %1[%2] : vector<[16]xi8> into vector<[16]x[16]xi8> ``` This is for consistency with `extract_tile_slice`, but also helps with readability as it makes it clear which operand is the index.	2024-09-02 11:12:40 +01:00
Benjamin Maxwell	e37d6d2a74	[mlir][ArmSME] Merge consecutive `arm_sme.intr.zero` ops (#106215 ) This merges consecutive SME zero intrinsics within a basic block, which avoids the backend eventually emitting multiple zero instructions when it could just use one. Note: This kind of peephole optimization could be implemented in the backend too.	2024-08-29 09:43:38 +01:00
Benjamin Maxwell	c93b45aaee	[mlir][ArmSME] Reword in-memory tile warning (NFC) (#92415 ) It did not make sense that this said "all tile operations will go through memory". Only the operations where the warning is emitted will go through memory. The message has been updated to reflect that.	2024-05-17 11:20:11 +01:00
Cullen Rhodes	b5107bdda3	[mlir][ArmSME] Verify ops on tile types post LLVM conversion (#92076 ) Unsupported ops on tile types can become dead after `-convert-arm-sme-to-llvm` resulting in incorrect results. Verify such operations don't exist post-conversion and fail if they do. Based on discussion from https://discourse.llvm.org/t/on-improving-arm-sme-lowering-resilience-in-mlir/78543	2024-05-16 08:17:37 +01:00
Benjamin Maxwell	041baf2f60	[mlir][ArmSME] Use liveness information in the tile allocator (#90448 ) This patch rewrites the ArmSME tile allocator to use liveness information to make better tile allocation decisions and improve the correctness of the ArmSME dialect. This algorithm used here is a linear scan over live ranges, where live ranges are assigned to tiles as they appear in the program (chronologically). Live ranges release their assigned tile ID when the current program point is passed their end. This is a greedy algorithm (which is mainly to keep the implementation relatively straightforward), and because it seems to be sufficient for most kernels (e.g. matmuls) that use ArmSME. The general steps of this are roughly from https://link.springer.com/content/pdf/10.1007/3-540-45937-5_17.pdf, though there have been a few simplifications and assumptions made for our use case. Hopefully, the only changes needed for a user of the ArmSME dialect is that: - `-allocate-arm-sme-tiles` will no longer be a standalone pass - `-test-arm-sme-tile-allocation` is only for unit tests - `-convert-arm-sme-to-llvm` must happen after `-convert-scf-to-cf` - SME tile allocation is now part of the LLVM conversion By integrating this into the `ArmSME -> LLVM` conversion we can allow high-level (value-based) ArmSME operations to be side-effect-free, as we can guarantee nothing will rearrange ArmSME operations before we emit intrinsics (which could invalidate the tile allocation). The hope is for ArmSME operations to have no hidden state/side effects and allow easily lowering dialects such as `vector` and `arith` to SME, without making assumptions about how the input IR looks, as the semantics of the operations will be the same. That is no (new) side effects and the IR follows the rules of SSA (a value will never change). The aim is correctness, so we have a base for working on optimizations.	2024-05-14 14:59:01 +01:00
Cullen Rhodes	fff86c6111	[mlir][ArmSME] Support 4-way widening outer products (#79288 ) This patch introduces support for 4-way widening outer products. This enables the fusion of 4 'arm_sme.outerproduct' operations that are chained via the accumulator into single widened operations. Changes: - Adds the following operations: - smopa_4way, smops_4way - umopa_4way, umops_4way - sumopa_4way, sumops_4way - sumopa_4way, sumops_4way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Extends 'arm-sme-outer-product' pass. For a detailed description of these operations see the 'arm_sme.smopa_4way' description.	2024-02-07 08:17:47 +00:00
Cullen Rhodes	95ef8e3868	[mlir][ArmSME] Support 2-way widening outer products (#78975 ) This patch introduces support for 2-way widening outer products. This enables the fusion of 2 'arm_sme.outerproduct' operations that are chained via the accumulator into a 2-way widening outer product operation. Changes: - Add 'llvm.aarch64.sme.[us]mop[as].za32' intrinsics for 2-way variants. These map to instruction variants added in SME2 and use different intrinsics. Intrinsics are already implemented for widening variants from SME1. - Adds the following operations: - fmopa_2way, fmops_2way - smopa_2way, smops_2way - umopa_2way, umops_2way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Adds a pass 'arm-sme-outer-product-fusion' that fuses 'arm_sme.outerproduct' operations. For a detailed description of these operations see the 'arm_sme.fmopa_2way' description. The reason for introducing many operations rather than one is the signed/unsigned variants can't be distinguished with types (e.g., ui16, si16) since 'arith.extui' and 'arith.extsi' only support signless integers. A single operation would require this information and an attribute (for example) for the sign doesn't feel right if floating-point types are also supported where this wouldn't apply. Furthermore, the SME FP8 extensions (FEAT_SME_F8F16, FEAT_SME_F8F32) introduce FMOPA 2-way (FP8 to FP16) and 4-way (FP8 to FP32) variants but no subtract variant. Whilst these are not supported in this patch, it felt simpler to have separate ops for add/subtract given this.	2024-01-31 09:13:18 +00:00
Matthias Springer	5fcf907b34	[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260 ) This commit renames 4 pattern rewriter API functions: * `updateRootInPlace` -> `modifyOpInPlace` * `startRootUpdate` -> `startOpModification` * `finalizeRootUpdate` -> `finalizeOpModification` * `cancelRootUpdate` -> `cancelOpModification` The term "root" is a misnomer. The root is the op that a rewrite pattern matches against (https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional). A rewriter must be notified of all in-place op modifications, not just in-place modifications of the root (https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old function names were confusing and have contributed to various broken rewrite patterns. Note: The new function names use the term "modify" instead of "update" for consistency with the `RewriterBase::Listener` terminology (`notifyOperationModified`).	2024-01-17 11:08:59 +01:00
Mehdi Amini	75e185d50c	Apply clang-tidy fixes for readability-simplify-boolean-expr in LegalizeForLLVMExport.cpp (NFC)	2024-01-15 20:59:12 -08:00
Benjamin Maxwell	b0aebbd41a	[mlir][ArmSME] Workaround for old versions of GCC (NFC) (#78046 ) See: https://github.com/llvm/llvm-project/pull/76086#issuecomment-1890424955	2024-01-14 09:18:53 +00:00
Benjamin Maxwell	5417a5fed6	[mlir][ArmSME] Add rudimentary support for tile spills to the stack (#76086 ) This adds very basic (and inelegant) support for something like spilling and reloading tiles, if you use more SME tiles than physically exist. This is purely implemented to prevent the compiler from aborting if a function uses too many tiles (i.e. due to bad unrolling), but is expected to perform very poorly. Currently, this works in two stages: During tile allocation, if we run out of tiles instead of giving up, we switch to allocating 'in-memory' tile IDs. These are tile IDs that start at 16 (which is higher than any real tile ID). A warning will also be emitted for each (root) tile op assigned an in-memory tile ID: ``` warning: failed to allocate SME virtual tile to operation, all tile operations will go through memory, expect degraded performance ``` Everything after this works like normal until `-convert-arm-sme-to-llvm` Here the in-memory tile op: ```mlir arm_sme.tile_op { tile_id = <IN MEMORY TILE> } ``` Is lowered to: ```mlir // At function entry: %alloca = memref.alloca ... : memref<?x?xty> // Around the op: // Swap the contents of %alloca and tile 0. scf.for %slice_idx { %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}> "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx) <{tile_id = 0 : i32}> vector.store %current_slice, %alloca[%slice_idx, %c0] } // Execute op using tile 0. arm_sme.tile_op { tile_id = 0 } // Swap the contents of %alloca and tile 0. // This restores tile 0 to its original state. scf.for %slice_idx { %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}> "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx) <{tile_id = 0 : i32}> vector.store %current_slice, %alloca[%slice_idx, %c0] } ``` This is inserted during the lowering to LLVM as spilling/reloading registers is a very low-level concept, that can't really be modeled correctly at a high level in MLIR. Note: This is always doing the worst case full-tile swap. This could be optimized to only spill/load data the tile op will use, which could be just a slice. It's also not making any use of liveness, which could allow reusing tiles. But these is not seen as important as correct code should only use the available number of tiles.	2024-01-12 14:51:47 +00:00
Benjamin Maxwell	53d48902bc	[mlir][ArmSME] Add arm_sme.streaming_vl operation (#77321 ) This operation provides a convenient way to query the streaming vector length regardless of the streaming mode. This most useful for functions that call/pass data to streaming functions, but are not streaming themselves. Example: ```mlir %svl_w = arm_sme.streaming_vl <word> ``` Created based on discussion here: https://github.com/llvm/llvm-project/pull/76086#discussion_r1434226352	2024-01-10 10:11:44 +00:00
Benjamin Maxwell	a4e15416b4	[mlir][ArmSME] Move creation of load/store intrinsics to helpers (NFC) (#76168 ) Also, for consistency make the ZeroOp lowering switch on the ArmSMETileType, rather than the element bit width.	2023-12-21 17:46:12 +00:00
Benjamin Maxwell	01e40a8a3d	[mlir][ArmSME] Remove ArmSMETypeConverter (and configure LLVM one instead) (#73639 ) This patch removes the ArmSMETypeConverter, and instead updates `populateArmSMEToLLVMConversionPatterns()` to add an ArmSME vector type conversion to the existing LLVMTypeConverter. This makes it easier to add these patterns to an existing `-to-llvm` lowering pass.	2023-12-04 17:02:48 +00:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00
Benjamin Maxwell	dff97c1e4c	[mlir][ArmSME] Move ArmSME -> intrinsics lowerings to `convert-arm-sme-to-llvm` pass (#72890 ) This gives more flexibility with when these lowerings are performed, without also lowering unrelated vector ops. This is a NFC (other than adding a new `-convert-arm-sme-to-llvm` pass)	2023-11-22 13:36:36 +00:00

17 Commits