14 Commits

Author SHA1 Message Date
Frank Schlimbach
d5746d73ce
eliminating g++ warnings (#105520)
Eliminating g++ warnings. Mostly declaring "[[maybe_unused]]", adding
return statements where missing and fixing casts.

@rengolin

---------

Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
Co-authored-by: Renato Golin <rengolin@systemcall.eu>
2024-10-18 21:20:47 +01:00
Chenguang Wang
a41a4b8fed
Revert "[mlir][ArmSME] Suppress potential unused warning (#99573)" (#99578)
This reverts commit 05bce3f079b677edd0efd28e3923f4776ffb8b59.

The work was already done in 99faa03.
2024-07-18 14:59:26 -07:00
Chenguang Wang
05bce3f079
[mlir][ArmSME] Suppress potential unused warning (#99573)
When building in release mode, the assert will be dropped, making
`remove` unused.
2024-07-18 14:55:22 -07:00
Kazu Hirata
99faa038c6 [mlir] Fix a warning
This patch fixes:

  mlir/lib/Dialect/ArmSME/Transforms/TileAllocation.cpp:621:16: error:
  unused variable 'removed' [-Werror,-Wunused-variable]
2024-07-18 14:41:15 -07:00
Benjamin Maxwell
eed72d4381
[mlir][ArmSME] Support filling liveness 'holes' in the tile allocator (#98350)
Holes in a live range are points where the corresponding value does not
need to be in a tile/register. If the tile allocator keeps track of
these holes it can reuse tiles for more values (avoiding spills).

Take this simple example:

```mlir
func.func @example(%cond: i1) {
  %tileA = arm_sme.get_tile : vector<[4]x[4]xf32>
  cf.cond_br %cond, ^bb2, ^bb1
^bb1:
  // If we end up here we never use %tileA again!
  "test.some_use"(%tileB) : (vector<[4]x[4]xf32>) -> ()
  cf.br ^bb3
^bb2:
  "test.some_use"(%tileA) : (vector<[4]x[4]xf32>) -> ()
  cf.br ^bb3
^bb3:
  return
}
```

If you were to calculate the liveness of %tileA and %tileB. You'd see
there is a hole in the liveness of %tileA in bb1:

```
      %tileA  %tileB
^bb0:  Live
^bb1:          Live
^bb2:  Live
```

The tile allocator can make use of that hole and reuse the tile ID it
assigned to %tileA for %tileB.
2024-07-18 20:13:45 +01:00
Christian Ulmann
b00e0c1671
[MLIR][Analysis] Consolidate topological sort utilities (#92563)
This PR attempts to consolidate the different topological sort utilities
into one place. It adds them to the analysis folder because the
`SliceAnalysis` uses some of these.

There are now two different sorting strategies: 
1. Sort only according to SSA use-def chains
2. Sort while taking regions into account. This requires a much more
elaborate traversal and cannot be applied on graph regions that easily.

This additionally reimplements the region aware topological sorting
because the previous implementation had an exponential space complexity.

I'm open to suggestions on how to combine this further or how to fuse
the test passes.
2024-05-22 08:48:10 +02:00
Christian Ulmann
9e39a0c723 [MLIR][ArmSME] Fix for block sorting refactor
This commit fixes a breakage introduced by changing the name of the
block sorting function.

Related PR: https://github.com/llvm/llvm-project/pull/92558
2024-05-17 15:59:04 +00:00
Benjamin Maxwell
041baf2f60
[mlir][ArmSME] Use liveness information in the tile allocator (#90448)
This patch rewrites the ArmSME tile allocator to use liveness
information to make better tile allocation decisions and improve the
correctness of the ArmSME dialect. This algorithm used here is a linear
scan over live ranges, where live ranges are assigned to tiles as they
appear in the program (chronologically). Live ranges release their
assigned tile ID when the current program point is passed their end.
This is a greedy algorithm (which is mainly to keep the implementation
relatively straightforward), and because it seems to be sufficient for
most kernels (e.g. matmuls) that use ArmSME. The general steps of this
are roughly from
https://link.springer.com/content/pdf/10.1007/3-540-45937-5_17.pdf,
though there have been a few simplifications and assumptions made for
our use case.

Hopefully, the only changes needed for a user of the ArmSME dialect is
that:

- `-allocate-arm-sme-tiles` will no longer be a standalone pass 
  - `-test-arm-sme-tile-allocation` is only for unit tests 
- `-convert-arm-sme-to-llvm` must happen after `-convert-scf-to-cf` 
   - SME tile allocation is now part of the LLVM conversion

By integrating this into the `ArmSME -> LLVM` conversion we can allow
high-level (value-based) ArmSME operations to be side-effect-free, as we
can guarantee nothing will rearrange ArmSME operations before we emit
intrinsics (which could invalidate the tile allocation).

The hope is for ArmSME operations to have no hidden state/side effects
and allow easily lowering dialects such as `vector` and `arith` to SME,
without making assumptions about how the input IR looks, as the
semantics of the operations will be the same. That is no (new) side
effects and the IR follows the rules of SSA (a value will never change).

The aim is correctness, so we have a base for working on optimizations.
2024-05-14 14:59:01 +01:00
Matthias Springer
5fcf907b34
[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`

The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.

Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
2024-01-17 11:08:59 +01:00
Matthias Springer
e2bb47caa6
[mlir][Arm] Fix invalid rewrite pattern API violations (#78246)
This commit fixes rewrite pattern API violations:
* Rewrite pattern must return "failure" if the IR was not modified.
* In-place op modifications must be communicated to the rewriter
(`updateRootInPlace`).

This commit fixes `test/Dialect/ArmSVE/legalize-vector-storage.mlir`,
`test/Dialect/ArmSME/vector-ops-to-llvm.mlir`,
`test/Dialect/ArmSME/tile-allocation-invalid.mlir`,
`test/Conversion/ArmSMEToLLVM/arm-sme-to-llvm.mlir`,
`test/Conversion/ArmSMEToLLVM/tile-spills-and-fills.mlir`,
`test/Conversion/ArmSMEToLLVM/unsupported.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.

---------

Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
2024-01-16 13:26:39 +01:00
Benjamin Maxwell
5417a5fed6
[mlir][ArmSME] Add rudimentary support for tile spills to the stack (#76086)
This adds very basic (and inelegant) support for something like spilling
and reloading tiles, if you use more SME tiles than physically exist.

This is purely implemented to prevent the compiler from aborting if a
function uses too many tiles (i.e. due to bad unrolling), but is
expected to perform very poorly.

Currently, this works in two stages:

During tile allocation, if we run out of tiles instead of giving up, we
switch to allocating 'in-memory' tile IDs. These are tile IDs that start
at 16 (which is higher than any real tile ID). A warning will also be
emitted for each (root) tile op assigned an in-memory tile ID:

```
warning: failed to allocate SME virtual tile to operation, all tile operations will go through memory, expect degraded performance
```

Everything after this works like normal until `-convert-arm-sme-to-llvm`

Here the in-memory tile op:

```mlir
arm_sme.tile_op { tile_id = <IN MEMORY TILE> }
```

Is lowered to:

```mlir
// At function entry:
%alloca = memref.alloca ... : memref<?x?xty>

// Around the op:
// Swap the contents of %alloca and tile 0.
scf.for %slice_idx {
  %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}>
  "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx)  <{tile_id = 0 : i32}>
  vector.store %current_slice, %alloca[%slice_idx, %c0]
}
// Execute op using tile 0.
arm_sme.tile_op { tile_id = 0 }
// Swap the contents of %alloca and tile 0.
// This restores tile 0 to its original state.
scf.for %slice_idx {
  %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}>
  "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx)  <{tile_id = 0 : i32}>
  vector.store %current_slice, %alloca[%slice_idx, %c0]
}
```

This is inserted during the lowering to LLVM as spilling/reloading
registers is a very low-level concept, that can't really be modeled
correctly at a high level in MLIR.

Note: This is always doing the worst case full-tile swap. This could be
optimized to only spill/load data the tile op will use, which could be
just a slice. It's also not making any use of liveness, which could
allow reusing tiles. But these is not seen as important as correct code
should only use the available number of tiles.
2024-01-12 14:51:47 +00:00
Matthias Springer
db8a119e8f
[mlir][ArmSME] Fix invalid rewriter API usage (#76123)
When operations are modified in-place, the rewriter must be notified.
This commit fixes `mlir/test/Conversion/ArmSMEToLLVM/unsupported.mlir`,
`mlir/test/Dialect/ArmSME/tile-zero-masks.mlir` and
`mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` enabled.
2023-12-21 17:39:36 +09:00
Benjamin Maxwell
eaff02f28e
[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253)
This reworks the ArmSME dialect to use attributes for tile allocation.
This has a number of advantages and corrects some issues with the
previous approach:

* Tile allocation can now be done ASAP (i.e. immediately after
`-convert-vector-to-arm-sme`)
* SSA form for control flow is now supported (e.g.`scf.for` loops that
yield tiles)
* ArmSME ops can be converted to intrinsics very late (i.e. after
lowering to control flow)
 * Tests are simplified by removing constants and casts
* Avoids correctness issues with representing LLVM `immargs` as MLIR
values
- The tile ID on the SME intrinsics is an `immarg` (so is required to be
a compile-time constant), `immargs` should be mapped to MLIR attributes
(this is already the case for intrinsics in the LLVM dialect)
- Using MLIR values for `immargs` can lead to invalid LLVM IR being
generated (and passes such as -cse making incorrect optimizations)

As part of this patch we bid farewell to the following operations:

```mlir
arm_sme.get_tile_id : i32
arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32>
arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32
```

These are now replaced with:
```mlir
// Allocates a new tile with (indeterminate) state:
arm_sme.get_tile : vector<[4]x[4]xi32>
// A placeholder operation for lowering ArmSME ops to intrinsics:
arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32>
```

The new tile allocation works by operations implementing the
`ArmSMETileOpInterface`. This interface says that an operation needs to
be assigned a tile ID, and may conditionally allocate a new SME tile.

Operations allocate a new tile by implementing...
```c++
std::optional<arm_sme::ArmSMETileType> getAllocatedTileType()
```
...and returning what type of tile the op allocates (ZAB, ZAH, etc).

Operations that don't allocate a tile return `std::nullopt` (which is
the default behaviour).

Currently the following ops are defined as allocating:
```mlir
arm_sme.get_tile
arm_sme.zero
arm_sme.tile_load
arm_sme.outerproduct // (if no accumulator is specified)
```

Allocating operations become the roots for the tile allocation pass,
which currently just (naively) assigns all transitive uses of a root
operation the same tile ID. However, this is enough to handle current
use cases.

Once tile IDs have been allocated subsequent rewrites can forward the
tile IDs to any newly created operations.
2023-11-30 10:22:22 +00:00
Cullen Rhodes
fb54fec726 [mlir][ArmSME] Implement tile allocation
This patch adds a pass '-allocate-sme-tiles' to the ArmSME dialect that
implements allocation of SME ZA tiles.

It does this at the 'func.func' op level by replacing
'arm_sme.get_tile_id' ops with 'arith.constant' ops that represent the
tile number. The tiles in use in a given function are tracked by an
integer function attribute 'arm_sme.tiles_in_use' that is a 16-bit tile
mask with a bit for each 128-bit element tile (ZA0.Q-ZA15.Q), the
smallest ZA tile granule. This is initialized on the first
'arm_sme.get_tile_id' rewrite and updated on each subsequent rewrite.
Mixing of different element tile types is supported.

Section B2.3.2 of the SME spec [1] describes how the 128-bit element
tiles overlap with other element tiles.

Depends on D154941

[1] https://developer.arm.com/documentation/ddi0616/aa

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D154955
2023-07-18 08:46:40 +00:00