Ahmed S. Taei 22a2d74c0c
[NVPTX] Emit ld.v4.b16 for loading <4 x bfloat> (#109069)
This PR enables emitting a single load instruction for <4 x bfloat>,
otherwise, 2 ld.b32 loads are generated.
2024-09-17 21:06:46 -07:00
..
2023-07-11 14:43:35 +01:00
2023-06-28 11:57:13 -07:00