llvm-project

History

[mlir][AMDGPU] Add canonicalization pattern to pack scales for ScaledMFMAOp (#155951 )

The ScaledMFMAOp accepts scales as a vector of 4 bytes
(`vector<4xf8E8M0FNU>`) that can be stored in a single register with a
particular scale accessed using the `OpSel` attribute. Currently, we
only use one byte in this 4-byte vector, resulting in 3 wasted
registers.

This is fixed by identifying when single byte extractions are performed
and rewriting them into extractions of 4-byte vectors.

Example:
```
  %unit = vector.extract %ScaleSrc[offsets] : f8E8M0FNU from vector<?x?x?xf8E8M0FNU>
  %scale = vector.insert %unit, ... : f8E8M0FNU into vector<4xf8E8M0FNU>
  amdgpu.scaled_mfma(%scale[0] * ...
```
to
```
  %reshaped = vector.shape_cast %ScaleSrc : vector<?x?x?xf8E8M0FNU> to vector<?x4xf8E8M0FNU> 
  %scale = vector.extract %reshaped[?] : vector<4xf8E8M0FNU> from vector<?x4xf8E8M0FNU>
  amdgpu.scaled_mfma(%scale[0-3] * ...
```

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

2025-09-18 19:25:14 +00:00

[mlir][AMDGPU] Add canonicalization pattern to pack scales for ScaledMFMAOp (#155951 )

2025-09-18 19:25:14 +00:00

Transforms

[MLIR][BUG] fix {$VARIABLE} usage in CMakeLists.txt (#156183 )

2025-09-02 10:32:51 +08:00

Utils

[mlir][amdgpu] Align Chipset with TargetParser (#107720 )

2024-09-09 11:12:26 -04:00

CMakeLists.txt

[mlir][amdgpu] Remove shared memory optimization pass (#88225 )

2024-04-11 11:07:17 -04:00