As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4. Define 2 new flags:
BPF_LOAD_ACQ 0x100
BPF_STORE_REL 0x110
A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).
Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).
Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or
16-bit load-acquire zero-extends the value before writing it to a 32-bit
register, just like ARM64 instruction LDAPRH and friends.
As an example (assuming little-endian):
long foo(long *ptr) {
return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
}
foo() can be compiled to:
db 10 00 00 00 01 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
95 00 00 00 00 00 00 00 exit
opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
imm (0x00000100): BPF_LOAD_ACQ
Similarly:
void bar(short *ptr, short val) {
__atomic_store_n(ptr, val, __ATOMIC_RELEASE);
}
bar() can be compiled to:
cb 21 00 00 10 01 00 00 store_release((u16 *)(r1 + 0x0), w2)
95 00 00 00 00 00 00 00 exit
opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
imm (0x00000110): BPF_STORE_REL
Inline assembly is also supported.
Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let
developers detect this new feature. It can also be disabled using a new
llc option, -disable-load-acq-store-rel.
Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain"
store (BPF_MEM | BPF_STX) instruction:
void foo(short *ptr, short val) {
__atomic_store_n(ptr, val, __ATOMIC_RELAXED);
}
6b 21 00 00 00 00 00 00 *(u16 *)(r1 + 0x0) = w2
95 00 00 00 00 00 00 00 exit
Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate
a zero-extending, "plain" load (BPF_MEM | BPF_LDX) instruction:
int foo(char *ptr) {
return __atomic_load_n(ptr, __ATOMIC_RELAXED);
}
71 11 00 00 00 00 00 00 w1 = *(u8 *)(r1 + 0x0)
bc 10 08 00 00 00 00 00 w0 = (s8)w1
95 00 00 00 00 00 00 00 exit
Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE. Using
__ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and
will cause an error:
$ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null
bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is
not supported
1 | int foo(int *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); }
| ^
...
Finally, rename those isST*() and isLD*() helper functions in
BPFMISimplifyPatchable.cpp based on what the instructions actually do,
rather than their instruction class.
[1]
https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
For some reason `__BPF_FEATURE_MAY_GOTO` is available for CPUs v{2,3,4}
but is not available for CPU v1. This limitation is arbitrary:
- the instruction is never produced by LLVM backend;
- on Linux Kernel side this instruction is available in kernels that
also support CPUv4.
Hence, it is more consistent to either always allow
`__BPF_FEATURE_MAY_GOTO` or only allow it for CPUv4.
Before llvm20, (void)__sync_fetch_and_add(...) always generates locked
xadd insns. In linux kernel upstream discussion [1], it is found that
for arm64 architecture, the original semantics of
(void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is
preferred in order for jit to emit proper native barrier insns.
In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will
generate the following insns:
- for cpu v1/v2: locked xadd insns to keep backward compatibility
- for cpu v3/v4: __atomic_fetch_add() insns
To ensure proper barrier semantics for (void)__sync_fetch_and_add(...),
cpu v3/v4 is recommended.
This patch enables cpu=v3 as the default cpu version. For users wanting
to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc
command line.
[1]
https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f
[2] https://github.com/llvm/llvm-project/pull/101428
[3] https://github.com/llvm/llvm-project/pull/106494
There are a few places where `arena` name is used for pointers in
non-zero address space in BPF backend, rename these to use a more
generic `address_space`:
- macro `__BPF_FEATURE_ARENA_CAST` -> `__BPF_FEATURE_ADDR_SPACE_CAST
- name for arena global variables section `.arena.N` ->
`.addr_space.N`
This commit aims to support BPF arena kernel side
[feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/):
- arena is a memory region accessible from both BPF program and
userspace;
- base pointers for this memory region differ between kernel and user
spaces;
- `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)`
translates src_reg, a pointer in src_addr_space to dst_reg, equivalent
pointer in dst_addr_space, {src,dst}_addr_space are immediate constants;
- number 0 is assigned to kernel address space;
- number 1 is assigned to user address space.
On the LLVM side, the goal is to make load and store operations on arena
pointers "transparent" for BPF programs:
- assume that pointers with non-zero address space are pointers to
arena memory;
- assume that arena is identified by address space number;
- assume that address space zero corresponds to kernel address space;
- assume that every BPF-side load or store from arena is done via
pointer in user address space, thus convert base pointers using
`addr_space_cast(src_reg, 0, 1)`;
Only load, store, cmpxchg and atomicrmw IR instructions are handled by
this transformation.
For example, the following C code:
```c
#define __as __attribute__((address_space(1)))
void copy(int __as *from, int __as *to) { *to = *from; }
```
Compiled to the following IR:
```llvm
define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) {
entry:
%0 = load i32, ptr addrspace(1) %from, align 4
store i32 %0, ptr addrspace(1) %to, align 4
ret void
}
```
Is transformed to:
```llvm
%to2 = addrspacecast ptr addrspace(1) %to to ptr ;; !
%from1 = addrspacecast ptr addrspace(1) %from to ptr ;; !
%0 = load i32, ptr %from1, align 4, !tbaa !3
store i32 %0, ptr %to2, align 4, !tbaa !3
ret void
```
And compiled as:
```asm
r2 = addr_space_cast(r2, 0, 1)
r1 = addr_space_cast(r1, 0, 1)
r1 = *(u32 *)(r1 + 0)
*(u32 *)(r2 + 0) = r1
exit
```
Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>
Sometimes bpf developer might want to develop different codes
based on particular cpu versioins. For example, cpu v1/v2/v3
branch target is 16bit while cpu v4 branch target is 32bit,
thus cpu v4 allows more aggressive loop unrolling than cpu v1/v2/v3
(see [1] for a kernel selftest failure due to this).
We would like to maintain aggressive loop unrolling for cpu v4
while limit loop unrolling for earlier cpu versions.
Another example, signed divide also only available with cpu v4.
Actually, adding cpu specific macros are fairly common
in llvm. For example, x86 has maco like 'i486', '__pentium_mmx__', etc.
AArch64 has '__ARM_NEON', '__ARM_FEATURE_SVE', etc.
This patch added __BPF_CPU_VERSION__ macro. Current possible values
are 0/1/2/3/4. The following are the -mcpu=... to __BPF_CPU_VERSION__
mapping:
```
cpu __BPF_CPU_VERSION__
no -mcpu=<...> 1
-mcpu=v1 1
-mcpu=v2 2
-mcpu=v3 3
-mcpu=v4 4
-mcpu=generic 1
-mcpu=probe 0
```
This patch also added some macros for developers to identify some cpu
insn features:
```
feature macro enabled in which cpu
__BPF_FEATURE_JMP_EXT >= v2
__BPF_FEATURE_JMP32 >= v3
__BPF_FEATURE_ALU32 >= v3
__BPF_FEATURE_LDSX >= v4
__BPF_FEATURE_MOVSX >= v4
__BPF_FEATURE_BSWAP >= v4
__BPF_FEATURE_SDIV_SMOD >= v4
__BPF_FEATURE_GOTOL >= v4
__BPF_FEATURE_ST >= v4
```
[1]
https://lore.kernel.org/bpf/3e3a8a30-dde0-43a1-981e-2274962780ef@linux.dev/
"DefineStd(Builder, "bpf", Opts)" generates the following three
macros:
bpf
__bpf
__bpf__
and the macro "bpf" is due to the fact that the target language
is C which allows GNU extensions.
The name "bpf" could be easily used as variable name or type
field name. For example, in current linux kernel, there are
four places where bpf is used as a field name. If the corresponding
types are included in bpf program, the compilation error will
occur.
This patch removed predefined macro "bpf" as well as "__bpf" which
is rarely used if used at all.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D61173
llvm-svn: 359310