The two paterns for handlig vector.maskedload on AMD GPUs had an overlap
- both the "scalar mask becomes an if statement" pattern and the "masked
loads become a normal load + a select on buffers" patterns could handle
a load with a broadcast mask on a fat buffer resource.
This commet add checks to resolve the overlap.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
This patch adds a better maskedload/maskedstore lowering on amdgpu
backend for loads which are either fully masked or fully unmasked. For
these cases, we can either generate a oob buffer load with no if
condition, or we can generate a normal load with a if condition (if no
fat_raw_buffer space).
This PR reworks https://github.com/llvm/llvm-project/pull/131803.
Instead of applying the optimization on transfer_read op, which is too
high level, it redirect the pre-existing pattern onto maskedload op.
This simplified the implementation of the lowering pattern. This also
allows moving the usage of the pass to a target dependent pipeline.
Signed-off-by: jerryyin <zhuoryin@amd.com>