17 Commits

Author SHA1 Message Date
yonghong-song
0e0bfacff7
[BPF] Add support for may_goto insn (#85358)
Alexei added may_goto insn in [1]. The asm syntax for may_goto looks
like
  may_goto <label>

The instruction represents a conditional branch but the condition is
implicit. Later in bpf kernel verifier, the 'may_goto <label>' insn will
be rewritten with an explicit condition. The encoding of 'may_goto' insn
is enforced in [2] and is also implemented in this patch.

In [3], 'may_goto' insn is encoded with raw bytes. I made the following
change
```
  --- a/tools/testing/selftests/bpf/bpf_experimental.h
  +++ b/tools/testing/selftests/bpf/bpf_experimental.h
  @@ -328,10 +328,7 @@ l_true:                                                                                            \

   #define cond_break                                     \
          ({ __label__ l_break, l_continue;               \
  -        asm volatile goto("1:.byte 0xe5;                       \
  -                     .byte 0;                          \
  -                     .long ((%l[l_break] - 1b - 8) / 8) & 0xffff;      \
  -                     .short 0"                         \
  +        asm volatile goto("may_goto %l[l_break]"       \
                        :::: l_break);                    \
          goto l_continue;                                \
          l_break: break;
```
and ran the selftest with the latest llvm with this patch. All tests are
passed.

[1]
https://lore.kernel.org/bpf/20240306031929.42666-1-alexei.starovoitov@gmail.com/
[2]
https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com/
[3]
https://lore.kernel.org/bpf/20240306031929.42666-4-alexei.starovoitov@gmail.com/
2024-03-15 07:24:28 -07:00
eddyz87
65b123e287
[BPF] rename 'arena' to 'address_space' (#85161)
There are a few places where `arena` name is used for pointers in
non-zero address space in BPF backend, rename these to use a more
generic `address_space`:
- macro `__BPF_FEATURE_ARENA_CAST` -> `__BPF_FEATURE_ADDR_SPACE_CAST
- name for arena global variables section `.arena.N` ->
`.addr_space.N`
2024-03-14 19:20:06 -07:00
4ast
2aacb56e83
BPF address space insn (#84410)
This commit aims to support BPF arena kernel side
[feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/):
- arena is a memory region accessible from both BPF program and
userspace;
- base pointers for this memory region differ between kernel and user
spaces;
- `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)`
translates src_reg, a pointer in src_addr_space to dst_reg, equivalent
pointer in dst_addr_space, {src,dst}_addr_space are immediate constants;
- number 0 is assigned to kernel address space;
- number 1 is assigned to user address space.

On the LLVM side, the goal is to make load and store operations on arena
pointers "transparent" for BPF programs:
- assume that pointers with non-zero address space are pointers to
  arena memory;
- assume that arena is identified by address space number;
- assume that address space zero corresponds to kernel address space;
- assume that every BPF-side load or store from arena is done via
pointer in user address space, thus convert base pointers using
`addr_space_cast(src_reg, 0, 1)`;

Only load, store, cmpxchg and atomicrmw IR instructions are handled by
this transformation.

For example, the following C code:

```c
   #define __as __attribute__((address_space(1)))
   void copy(int __as *from, int __as *to) { *to = *from; }
```

Compiled to the following IR:

```llvm
    define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) {
    entry:
      %0 = load i32, ptr addrspace(1) %from, align 4
      store i32 %0, ptr addrspace(1) %to, align 4
      ret void
    }
```

Is transformed to:

```llvm
    %to2 = addrspacecast ptr addrspace(1) %to to ptr     ;; !
    %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; !
    %0 = load i32, ptr %from1, align 4, !tbaa !3
    store i32 %0, ptr %to2, align 4, !tbaa !3
    ret void
```

And compiled as:

```asm
    r2 = addr_space_cast(r2, 0, 1)
    r1 = addr_space_cast(r1, 0, 1)
    r1 = *(u32 *)(r1 + 0)
    *(u32 *)(r2 + 0) = r1
    exit
```

Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>
2024-03-13 02:27:25 +02:00
Nikolas Klauser
4a58284559
[clang] Refactor Builtins.def to be a tablegen file (#68324)
This makes the builtins list quite a bit more verbose, but IMO this is a
huge win in terms of readability.
2024-01-24 11:22:43 +01:00
yonghong-song
4e67234357
[Clang][BPF] Add __BPF_CPU_VERSION__ macro (#71856)
Sometimes bpf developer might want to develop different codes
based on particular cpu versioins. For example, cpu v1/v2/v3
branch target is 16bit while cpu v4 branch target is 32bit,
thus cpu v4 allows more aggressive loop unrolling than cpu v1/v2/v3
(see [1] for a kernel selftest failure due to this).
We would like to maintain aggressive loop unrolling for cpu v4
while limit loop unrolling for earlier cpu versions.
Another example, signed divide also only available with cpu v4.

Actually, adding cpu specific macros are fairly common
in llvm. For example, x86 has maco like 'i486', '__pentium_mmx__', etc.
AArch64 has '__ARM_NEON', '__ARM_FEATURE_SVE', etc.

This patch added __BPF_CPU_VERSION__ macro. Current possible values
are 0/1/2/3/4. The following are the -mcpu=... to __BPF_CPU_VERSION__
mapping:
```
       cpu                  __BPF_CPU_VERSION__
       no -mcpu=<...>       1
       -mcpu=v1             1
       -mcpu=v2             2
       -mcpu=v3             3
       -mcpu=v4             4
       -mcpu=generic        1
       -mcpu=probe          0
```
    
This patch also added some macros for developers to identify some cpu
insn features:
```
      feature macro               enabled in which cpu
      __BPF_FEATURE_JMP_EXT       >= v2
      __BPF_FEATURE_JMP32         >= v3
      __BPF_FEATURE_ALU32         >= v3
      __BPF_FEATURE_LDSX          >= v4
      __BPF_FEATURE_MOVSX         >= v4
      __BPF_FEATURE_BSWAP         >= v4
      __BPF_FEATURE_SDIV_SMOD     >= v4
      __BPF_FEATURE_GOTOL         >= v4
      __BPF_FEATURE_ST            >= v4
```    
[1]
https://lore.kernel.org/bpf/3e3a8a30-dde0-43a1-981e-2274962780ef@linux.dev/
2023-11-10 10:18:54 -08:00
Yonghong Song
6c412b6c6f [BPF] Add a few new insns under cpu=v4
In [1], a few new insns are proposed to expand BPF ISA to
  . fixing the limitation of existing insn (e.g., 16bit jmp offset)
  . adding new insns which may improve code quality
    (sign_ext_ld, sign_ext_mov, st)
  . feature complete (sdiv, smod)
  . better user experience (bswap)

This patch implemented insn encoding for
  . sign-extended load
  . sign-extended mov
  . sdiv/smod
  . bswap insns
  . unconditional jump with 32bit offset

The new bswap insns are generated under cpu=v4 for __builtin_bswap.
For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated
which is not intuitive for the user.

To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented.
For conditional branch which is beyond 16-bit offset, llvm will do
some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit
conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is
hueristic based. I have tested bpf selftest pyperf600 with unroll account
600 which can indeed generate 32-bit jump insn, e.g.,
        13:       06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619>

Eduard is working on to add 'st' insn to cpu=v4.

A list of llc flags:
  disable-ldsx, disable-movsx, disable-bswap,
  disable-sdiv-smod, disable-gotol
can be used to disable a particular insn for cpu v4.
For example, user can do:
  llc -march=bpf -mcpu=v4 -disable-movsx t.ll
to enable cpu v4 without movsx insns.

References:
  [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/

Differential Revision: https://reviews.llvm.org/D144829
2023-07-26 08:37:30 -07:00
serge-sans-paille
5a7f47cc02
[clang] Optimize clang::Builtin::Info density
Reorganize clang::Builtin::Info to have them naturally align on 4 bytes
boundaries.

Instead of storing builtin headers as a straight char pointer, enumerate
them and store the enum. It allows to use a small enum instead of a
pointer to reference them.

On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56
down to 48 bytes.

On a release build on my Linux 64 bit machine, it shrinks the size of
libclang-cpp.so by 193kB.

The impact on performance is negligible in terms of instruction count,
but the wall time seems better, see
https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock

Differential Revision: https://reviews.llvm.org/D142024
2023-01-23 14:27:44 +01:00
serge-sans-paille
a3c248db87
Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part
This is a follow-up to https://reviews.llvm.org/D140896, split into
several parts as it touches a lot of files.

Differential Revision: https://reviews.llvm.org/D141139
2023-01-09 12:15:24 +01:00
serge-sans-paille
d9ab3e82f3
[clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470.

The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.

Differential Revision: https://reviews.llvm.org/D139881
2022-12-27 09:55:19 +01:00
Kazu Hirata
0e9373a6a6 [Basic] Use llvm::is_contained (NFC) 2021-10-10 08:52:14 -07:00
Alessandro Decina
833e9b2ea7 [BPF] add support for 32 bit registers in inline asm
Add "w" constraint type which allows selecting 32 bit registers.
32 bit registers were added in https://reviews.llvm.org/rGca31c3bb3ff149850b664838fbbc7d40ce571879.

Differential Revision: https://reviews.llvm.org/D102118
2021-05-16 11:01:47 -07:00
Yonghong Song
05e46979d2 [BPF] do compile-once run-everywhere relocation for bitfields
A bpf specific clang intrinsic is introduced:
   u32 __builtin_preserve_field_info(member_access, info_kind)
Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.

The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
 enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
 };
for __builtin_preserve_field_info. The old
access offset relocation is covered by
    FIELD_BYTE_OFFSET = 0.

An example:
struct s {
    int a;
    int b1:9;
    int b2:4;
};
enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
};

void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {
  unsigned long long ull = 0;
  unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
  unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);
 #ifdef USE_PROBE_READ
  bpf_probe_read(&ull, size, (const void *)arg + offset);
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
  lshift = lshift + (size << 3) - 64;
 #endif
 #else
  switch(size) {
  case 1:
    ull = *(unsigned char *)((void *)arg + offset); break;
  case 2:
    ull = *(unsigned short *)((void *)arg + offset); break;
  case 4:
    ull = *(unsigned int *)((void *)arg + offset); break;
  case 8:
    ull = *(unsigned long long *)((void *)arg + offset); break;
  }
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #endif
  ull <<= lshift;
  if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
    return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
  return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
}

There is a minor overhead for bpf_probe_read() on big endian.

The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:
        r3 = r1
        r1 = 0
        r1 = 4  <=== relocation (FIELD_BYTE_OFFSET)
        r3 += r1
        r1 = r10
        r1 += -8
        r2 = 4  <=== relocation (FIELD_BYTE_SIZE)
        call bpf_probe_read
        r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r2
        r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
        r0 = r1
        r0 >>= r2
        r3 = 1  <=== relocation (FIELD_SIGNEDNESS)
        if r3 == 0 goto LBB0_2
        r1 s>>= r2
        r0 = r1
LBB0_2:
        exit

Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.
        r1 = 41   <=== relocation (FIELD_LSHIFT_U64)
        r6 += r1
        r6 += -64
        r6 <<= 32
        r6 >>= 32
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r6
        r2 = 60   <=== relocation (FIELD_RSHIFT_U64)

The code and relocation generated when using direct load.
        r2 = 0
        r3 = 4
        r4 = 4
        if r4 s> 3 goto LBB0_3
        if r4 == 1 goto LBB0_5
        if r4 == 2 goto LBB0_6
        goto LBB0_9
LBB0_6:                                 # %sw.bb1
        r1 += r3
        r2 = *(u16 *)(r1 + 0)
        goto LBB0_9
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
        if r4 == 8 goto LBB0_8
        goto LBB0_9
LBB0_8:                                 # %sw.bb9
        r1 += r3
        r2 = *(u64 *)(r1 + 0)
        goto LBB0_9
LBB0_5:                                 # %sw.bb
        r1 += r3
        r2 = *(u8 *)(r1 + 0)
        goto LBB0_9
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51
        r2 <<= r1
        r1 = 60
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.
        r2 = 0
        r3 = 4   <=== relocation
        r4 = 4   <=== relocation
        if r4 s> 3 goto LBB0_3
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51   <=== relocation
        r2 <<= r1
        r1 = 60   <=== relocation
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.

Differential Revision: https://reviews.llvm.org/D67980

llvm-svn: 374099
2019-10-08 18:23:17 +00:00
Yonghong Song
51a4a0d68f [BPF] do not generate predefined macro bpf
"DefineStd(Builder, "bpf", Opts)" generates the following three
macros:
  bpf
  __bpf
  __bpf__
and the macro "bpf" is due to the fact that the target language
is C which allows GNU extensions.

The name "bpf" could be easily used as variable name or type
field name. For example, in current linux kernel, there are
four places where bpf is used as a field name. If the corresponding
types are included in bpf program, the compilation error will
occur.

This patch removed predefined macro "bpf" as well as "__bpf" which
is rarely used if used at all.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D61173

llvm-svn: 359310
2019-04-26 15:35:51 +00:00
Jiong Wang
862e7405e8 bpf: teach BPF driver about the new CPU "v3"
This patch simply teach BPF driver about the new CPU "v3" introduced in
LLVM backend.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 353479
2019-02-07 22:51:56 +00:00
Chandler Carruth
2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Erich Keane
e44bdb3f70 Add Rest of Targets Support to ValidCPUList (enabling march notes)
A followup to: https://reviews.llvm.org/D42978

Most of the rest of the Targets were pretty rote, so this
patch knocks them all out at once. 

Differential Revision: https://reviews.llvm.org/D43057

llvm-svn: 324676
2018-02-08 23:16:55 +00:00
Erich Keane
ebba592682 Break up Targets.cpp into a header/impl pair per target type[NFCI]
Targets.cpp is getting unwieldy, and even minor changes cause the entire thing 
to cause recompilation for everyone. This patch bites the bullet and breaks 
it up into a number of files.

I tended to keep function definitions in the class declaration unless it 
caused additional includes to be necessary. In those cases, I pulled it 
over into the .cpp file. Content is copy/paste for the most part, 
besides includes/format/etc.


Differential Revision: https://reviews.llvm.org/D35701

llvm-svn: 308791
2017-07-21 22:37:03 +00:00