Krzysztof Drewniak e7dd7b81ac
[AMDGPU] tensor_{load_to/store_from}_lds => ..._d2 simplification (#171540)
This commit adds the rewrite

```
llvm.amdgcn.tensor.{load.to/store.from}.lds(
  <4 x i32> %d0, <8 x i32> %d1, <4 x i32> zeroinitializer,
  <4 x i32> zeroinitializer, i32 [cachepolicy])
=>
llvm.amdgcn.tensor.{load.to/store.from}.lds.d2(
  <4 x i32> %$d0, <8 x i32> %d1, i32 [cachepolicy])
```

This is justifed because, when the short encoding that uses the NULL
SGPR for registers 2 and 3 is used, the hardware acts as if those
registers were 0, including in the gather mode.

It is always safe not to run this transformation.

(Note: tests were LLM'd and then tweaked.)
2025-12-15 08:11:03 -08:00
..