Clang uses a long-time special handling of the case where 3 element
vector loads and stores are performed as 4 element, and then a
shufflevector is used to extract the used elements. Odd sized vector
codegen should now work reasonably well.
This patch removes the compiler argument `-fpreserve-vec3-type` and adds
a target hook to determine if the special handling of vector type is
needed.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Essentially, this makes this ill-formed:
```c++
using mat4 = _BitInt(12) [[clang::matrix_type(3, 3)]];
```
This matches preexisting behaviour for vector types (e.g.
`ext_vector_type`), and given that LLVM IR intrinsics for matrices also
take vector types, it seems like a sensible thing to do.
This is currently especially problematic since we sometimes lower matrix
types to LLVM array types instead, and while e.g. `[4 x i32]` and `<4 x
i32>` *probably* have the same similar memory layout (though I don’t
think it’s sound to rely on that either, see #117486), `[4 x i12]` and
`<4 x i12>` definitely don’t.