This enables more consecutive load folding during
aggressive-instcombine.
The original motivating example provided by Jeff Byrnes:
https://godbolt.org/z/8ebcTEjTs
Example provided by Nikita Popov: https://godbolt.org/z/Gv1j4vjqE as
part of my original attempt to fix the issue (PR
[#133301](https://github.com/llvm/llvm-project/pull/133301), see his
[comment](https://github.com/llvm/llvm-project/pull/133301#issuecomment-2984905809)).
This changes the value of `IsFast` returned by `In
SITargetLowering::allowsMisalignedMemoryAccessesImpl` to be non-zero for
private and flat addresses if the subtarget supports unaligned scratch
accesses.
This enables aggressive-instcombine to do more folding of consecutive
loads (see
[here](cbd496581f/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp (L811))).
Summary performance impact on
[composable_kernel](https://github.com/ROCm/composable_kernel):
|GPU|speedup (geomean*)|
|---|---|
|MI300A| 1.11|
|MI300X| 1.14|
|MI350X| 1.03|
[*] Just to be clear, this is the geomean across kernels which were
impacted by this change - not across all CK kernels.