llvm-project

History

AMDGPU: Report unaligned scratch access as fast if supported by tgt (#158036 )

This enables more consecutive load folding during
aggressive-instcombine.

The original motivating example provided by Jeff Byrnes:
https://godbolt.org/z/8ebcTEjTs

Example provided by Nikita Popov: https://godbolt.org/z/Gv1j4vjqE as
part of my original attempt to fix the issue (PR
[#133301](https://github.com/llvm/llvm-project/pull/133301), see his
[comment](https://github.com/llvm/llvm-project/pull/133301#issuecomment-2984905809)).

This changes the value of `IsFast` returned by `In
SITargetLowering::allowsMisalignedMemoryAccessesImpl` to be non-zero for
private and flat addresses if the subtarget supports unaligned scratch
accesses.

This enables aggressive-instcombine to do more folding of consecutive
loads (see
[here](cbd496581f/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp (L811))).

Summary performance impact on
[composable_kernel](https://github.com/ROCm/composable_kernel):

|GPU|speedup (geomean*)|
|---|---|
|MI300A| 1.11|
|MI300X| 1.14|
|MI350X| 1.03|

[*] Just to be clear, this is the geomean across kernels which were
impacted by this change - not across all CK kernels.

2025-09-15 05:03:02 -05:00

AArch64

…

AMDGPU

AMDGPU: Report unaligned scratch access as fast if supported by tgt (#158036 )

2025-09-15 05:03:02 -05:00

ARM

…

X86

[AggressiveInstCombine] Refactor foldLoadsRecursive to use m_ShlOrSelf (#155176 )

2025-08-25 20:11:10 +08:00

dbgloc-memchr.ll

…

funnel.ll

…

inline-strcmp-debugloc.ll

…

logic-combine.ll

…

lower-table-based-cttz-basics.ll

[AggressiveInstCombine] Make cttz fold more resiliant to non-array geps (#150896 )

2025-07-31 16:53:55 +01:00

lower-table-based-cttz-dereferencing-pointer.ll

…

lower-table-based-cttz-non-argument-value.ll

…

lower-table-based-cttz-zero-element.ll

…

masked-cmp.ll

…

memchr.ll

…

negative-lower-table-based-cttz.ll

[AggressiveInstCombine] Make cttz fold more resiliant to non-array geps (#150896 )

2025-07-31 16:53:55 +01:00

or-shift-chain.ll

…

patterned-load.ll

…

popcount.ll

…

pr50555.ll

…

rotate.ll

…

strncmp-1.ll

…

strncmp-2.ll

…

trunc_ashr.ll

…

trunc_assume.ll

…

trunc_const_expr.ll

…

trunc_lshr.ll

…

trunc_multi_uses.ll

…

trunc_phi.ll

…

trunc_select_cmp.ll

…

trunc_select.ll

…

trunc_shl.ll

…

trunc_udivrem.ll

…

trunc_unreachable_bb.ll

…

trunc_vector_instrs.ll

…

vector-or-load.ll

…