These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
This patch adds a better maskedload/maskedstore lowering on amdgpu
backend for loads which are either fully masked or fully unmasked. For
these cases, we can either generate a oob buffer load with no if
condition, or we can generate a normal load with a if condition (if no
fat_raw_buffer space).
This PR reworks https://github.com/llvm/llvm-project/pull/131803.
Instead of applying the optimization on transfer_read op, which is too
high level, it redirect the pre-existing pattern onto maskedload op.
This simplified the implementation of the lowering pattern. This also
allows moving the usage of the pass to a target dependent pipeline.
Signed-off-by: jerryyin <zhuoryin@amd.com>