llvm-project

History

Krzysztof Drewniak a4dd51d72f

[mlir][ArithToAMDGPU] Use native packing support (#150342 )

The current arith-to-amdgpu patterns for scaling_extf and scaling_truncf
don't take full advantage of the native packing ability of the
intrinsics being targetted. Scaling extension takes the location of the
two elements to be extended as a constant argument (byte for fp4, half
for fp8), and scaling truncation takes a 32-bit input register and a
byte or half to write the truncated values to.

Not using these features would cause excess unneeded register pressure.
This PR resolves the inefficiency.

It also adds a test for the expected usecase of extending or
truncateting a block of 32 values to/from fp4 with a uniform scale to
ensure that this usage has a minimal amount of vector shuffling.

2025-07-24 12:26:03 -05:00

ArithToAMDGPU.cpp

[mlir][ArithToAMDGPU] Use native packing support (#150342 )

2025-07-24 12:26:03 -05:00

CMakeLists.txt

[MLIR][AMDGPU] Introduce fp16 packed arithmetic (#105688 )

2024-08-26 12:48:57 -05:00