llvm-project

Author	SHA1	Message	Date
Peilin Ye	17bfc00f7c	[BPF] Add load-acquire and store-release instructions under -mcpu=v4 (#108636 ) As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. Define 2 new flags: BPF_LOAD_ACQ 0x100 BPF_STORE_REL 0x110 A "load-acquire" is a BPF_STX \| BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x100). Similarly, a "store-release" is a BPF_STX \| BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0x110). Unlike existing atomic read-modify-write operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example (assuming little-endian): long foo(long ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 00 01 00 00 r0 = load_acquire((u64 )(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC \| BPF_DW \| BPF_STX imm (0x00000100): BPF_LOAD_ACQ Similarly: void bar(short ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: cb 21 00 00 10 01 00 00 store_release((u16 )(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit opcode (0xcb): BPF_ATOMIC \| BPF_H \| BPF_STX imm (0x00000110): BPF_STORE_REL Inline assembly is also supported. Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let developers detect this new feature. It can also be disabled using a new llc option, -disable-load-acq-store-rel. Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain" store (BPF_MEM \| BPF_STX) instruction: void foo(short ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELAXED); } 6b 21 00 00 00 00 00 00 (u16 )(r1 + 0x0) = w2 95 00 00 00 00 00 00 00 exit Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate a zero-extending, "plain" load (BPF_MEM \| BPF_LDX) instruction: int foo(char ptr) { return __atomic_load_n(ptr, __ATOMIC_RELAXED); } 71 11 00 00 00 00 00 00 w1 = (u8 )(r1 + 0x0) bc 10 08 00 00 00 00 00 w0 = (s8)w1 95 00 00 00 00 00 00 00 exit Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE. Using __ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and will cause an error: $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is not supported 1 \| int foo(int ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); } \| ^ ... Finally, rename those isST() and isLD*() helper functions in BPFMISimplifyPatchable.cpp based on what the instructions actually do, rather than their instruction class. [1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/	2025-03-04 09:19:39 -08:00
eddyz87	eabc8857e7	[BPF] make __BPF_FEATURE_MAY_GOTO available for cpuv1 (#108071 ) For some reason `__BPF_FEATURE_MAY_GOTO` is available for CPUs v{2,3,4} but is not available for CPU v1. This limitation is arbitrary: - the instruction is never produced by LLVM backend; - on Linux Kernel side this instruction is available in kernels that also support CPUv4. Hence, it is more consistent to either always allow `__BPF_FEATURE_MAY_GOTO` or only allow it for CPUv4.	2024-09-24 11:46:33 +03:00
yonghong-song	7852ebc088	[BPF] Make -mcpu=v3 as the default (#107008 ) Before llvm20, (void)__sync_fetch_and_add(...) always generates locked xadd insns. In linux kernel upstream discussion [1], it is found that for arm64 architecture, the original semantics of (void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is preferred in order for jit to emit proper native barrier insns. In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will generate the following insns: - for cpu v1/v2: locked xadd insns to keep backward compatibility - for cpu v3/v4: __atomic_fetch_add() insns To ensure proper barrier semantics for (void)__sync_fetch_and_add(...), cpu v3/v4 is recommended. This patch enables cpu=v3 as the default cpu version. For users wanting to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc command line. [1] https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] https://github.com/llvm/llvm-project/pull/101428 [3] https://github.com/llvm/llvm-project/pull/106494	2024-09-03 07:15:18 -07:00
eddyz87	65b123e287	[BPF] rename 'arena' to 'address_space' (#85161 ) There are a few places where `arena` name is used for pointers in non-zero address space in BPF backend, rename these to use a more generic `address_space`: - macro `__BPF_FEATURE_ARENA_CAST` -> `__BPF_FEATURE_ADDR_SPACE_CAST - name for arena global variables section `.arena.N` -> `.addr_space.N`	2024-03-14 19:20:06 -07:00
4ast	2aacb56e83	BPF address space insn (#84410 ) This commit aims to support BPF arena kernel side [feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/): - arena is a memory region accessible from both BPF program and userspace; - base pointers for this memory region differ between kernel and user spaces; - `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)` translates src_reg, a pointer in src_addr_space to dst_reg, equivalent pointer in dst_addr_space, {src,dst}_addr_space are immediate constants; - number 0 is assigned to kernel address space; - number 1 is assigned to user address space. On the LLVM side, the goal is to make load and store operations on arena pointers "transparent" for BPF programs: - assume that pointers with non-zero address space are pointers to arena memory; - assume that arena is identified by address space number; - assume that address space zero corresponds to kernel address space; - assume that every BPF-side load or store from arena is done via pointer in user address space, thus convert base pointers using `addr_space_cast(src_reg, 0, 1)`; Only load, store, cmpxchg and atomicrmw IR instructions are handled by this transformation. For example, the following C code: ```c #define __as __attribute__((address_space(1))) void copy(int __as from, int __as to) { to = from; } ``` Compiled to the following IR: ```llvm define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) { entry: %0 = load i32, ptr addrspace(1) %from, align 4 store i32 %0, ptr addrspace(1) %to, align 4 ret void } ``` Is transformed to: ```llvm %to2 = addrspacecast ptr addrspace(1) %to to ptr ;; ! %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; ! %0 = load i32, ptr %from1, align 4, !tbaa !3 store i32 %0, ptr %to2, align 4, !tbaa !3 ret void ``` And compiled as: ```asm r2 = addr_space_cast(r2, 0, 1) r1 = addr_space_cast(r1, 0, 1) r1 = (u32 )(r1 + 0) (u32 )(r2 + 0) = r1 exit ``` Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>	2024-03-13 02:27:25 +02:00
yonghong-song	4e67234357	[Clang][BPF] Add __BPF_CPU_VERSION__ macro (#71856 ) Sometimes bpf developer might want to develop different codes based on particular cpu versioins. For example, cpu v1/v2/v3 branch target is 16bit while cpu v4 branch target is 32bit, thus cpu v4 allows more aggressive loop unrolling than cpu v1/v2/v3 (see [1] for a kernel selftest failure due to this). We would like to maintain aggressive loop unrolling for cpu v4 while limit loop unrolling for earlier cpu versions. Another example, signed divide also only available with cpu v4. Actually, adding cpu specific macros are fairly common in llvm. For example, x86 has maco like 'i486', '__pentium_mmx__', etc. AArch64 has '__ARM_NEON', '__ARM_FEATURE_SVE', etc. This patch added __BPF_CPU_VERSION__ macro. Current possible values are 0/1/2/3/4. The following are the -mcpu=... to __BPF_CPU_VERSION__ mapping: ``` cpu __BPF_CPU_VERSION__ no -mcpu=<...> 1 -mcpu=v1 1 -mcpu=v2 2 -mcpu=v3 3 -mcpu=v4 4 -mcpu=generic 1 -mcpu=probe 0 ``` This patch also added some macros for developers to identify some cpu insn features: ``` feature macro enabled in which cpu __BPF_FEATURE_JMP_EXT >= v2 __BPF_FEATURE_JMP32 >= v3 __BPF_FEATURE_ALU32 >= v3 __BPF_FEATURE_LDSX >= v4 __BPF_FEATURE_MOVSX >= v4 __BPF_FEATURE_BSWAP >= v4 __BPF_FEATURE_SDIV_SMOD >= v4 __BPF_FEATURE_GOTOL >= v4 __BPF_FEATURE_ST >= v4 ``` [1] https://lore.kernel.org/bpf/3e3a8a30-dde0-43a1-981e-2274962780ef@linux.dev/	2023-11-10 10:18:54 -08:00
Yonghong Song	51a4a0d68f	[BPF] do not generate predefined macro bpf "DefineStd(Builder, "bpf", Opts)" generates the following three macros: bpf __bpf __bpf__ and the macro "bpf" is due to the fact that the target language is C which allows GNU extensions. The name "bpf" could be easily used as variable name or type field name. For example, in current linux kernel, there are four places where bpf is used as a field name. If the corresponding types are included in bpf program, the compilation error will occur. This patch removed predefined macro "bpf" as well as "__bpf" which is rarely used if used at all. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61173 llvm-svn: 359310	2019-04-26 15:35:51 +00:00

7 Commits