This commit introduces attribute bpf_fastcall to declare BPF functions
that do not clobber some of the caller saved registers (R0-R5).
The idea is to generate the code complying with generic BPF ABI,
but allow compatible Linux Kernel to remove unnecessary spills and
fills of non-scratched registers (given some compiler assistance).
For such functions do register allocation as-if caller saved registers
are not clobbered, but later wrap the calls with spill and fill
patterns that are simple to recognize in kernel.
For example for the following C code:
#define __bpf_fastcall __attribute__((bpf_fastcall))
void bar(void) __bpf_fastcall;
void buz(long i, long j, long k);
void foo(long i, long j, long k) {
bar();
buz(i, j, k);
}
First allocate registers as if:
foo:
call bar # note: no spills for i,j,k (r1,r2,r3)
call buz
exit
And later insert spills fills on the peephole phase:
foo:
*(u64 *)(r10 - 8) = r1; # Such call pattern is
*(u64 *)(r10 - 16) = r2; # correct when used with
*(u64 *)(r10 - 24) = r3; # old kernels.
call bar
r3 = *(u64 *)(r10 - 24); # But also allows new
r2 = *(u64 *)(r10 - 16); # kernels to recognize the
r1 = *(u64 *)(r10 - 8); # pattern and remove spills/fills.
call buz
exit
The offsets for generated spills/fills are picked as minimal stack
offsets for the function. Allocated stack slots are not used for any
other purposes, in order to simplify in-kernel analysis.
This commit introduces attribute bpf_fastcall to declare BPF functions
that do not clobber some of the caller saved registers (R0-R5).
The idea is to generate the code complying with generic BPF ABI, but
allow compatible Linux Kernel to remove unnecessary spills and fills of
non-scratched registers (given some compiler assistance).
For such functions do register allocation as-if caller saved registers
are not clobbered, but later wrap the calls with spill and fill patterns
that are simple to recognize in kernel.
For example for the following C code:
#define __bpf_fastcall __attribute__((bpf_fastcall))
void bar(void) __bpf_fastcall;
void buz(long i, long j, long k);
void foo(long i, long j, long k) {
bar();
buz(i, j, k);
}
First allocate registers as if:
foo:
call bar # note: no spills for i,j,k (r1,r2,r3)
call buz
exit
And later insert spills fills on the peephole phase:
foo:
*(u64 *)(r10 - 8) = r1; # Such call pattern is
*(u64 *)(r10 - 16) = r2; # correct when used with
*(u64 *)(r10 - 24) = r3; # old kernels.
call bar
r3 = *(u64 *)(r10 - 24); # But also allows new
r2 = *(u64 *)(r10 - 16); # kernels to recognize the
r1 = *(u64 *)(r10 - 8); # pattern and remove spills/fills.
call buz
exit
The offsets for generated spills/fills are picked as minimal stack
offsets for the function. Allocated stack slots are not used for any
other purposes, in order to simplify in-kernel analysis.
Corresponding functionality had been merged in Linux Kernel as
[this](https://lore.kernel.org/bpf/172179364482.1919.9590705031832457529.git-patchwork-notify@kernel.org/)
patch set (the patch assumed that `no_caller_saved_regsiters` attribute
would be used by LLVM, naming does not matter for the Kernel).