Fabian Ritter f2749f6645
[LowerMemIntrinsics][AMDGPU] Optimize memset.pattern lowering (#185901)
This patch changes the lowering of the [experimental.memset.pattern intrinsic](https://llvm.org/docs/LangRef.html#llvm-experimental-memset-pattern-intrinsic)
to match the optimized memset and memcpy lowering when possible. (The tl;dr of
memset.pattern is that it is like memset, except that you can use it to set
values that are wider than a single byte.)

The memset.pattern lowering now queries `TTI::getMemcpyLoopLoweringType` for a
preferred memory access type. If the size of that type is a multiple of the set
value's type, and if both types have consistent store and alloc sizes (since
memset.pattern behaves in a way that is not well suitable for access widening
if store and alloc size differ), the memset.pattern is lowered into two loops:
a main loop that stores a sufficiently wide vector splat of the SetValue with
the preferred memory access type and a residual loop that covers the remaining
set values individually.

In contrast to the memset lowering, this patch doesn't include a specialized
lowering for residual loops with known constant lengths. Loops that are
statically known to be unreachable will not be emitted.

For backends that don't override `TTI::getMemcpyLoopLoweringType`, the
generated code is mostly unchanged except for more consistent basic block
names, no more `br i1 false` for memset.patterns with known size, and a flipped
loop condition for memset.patterns with known size (see test changes).

This is a follow-up to a similar patch for memset: #169040
2026-03-13 10:37:33 +01:00
..