587 Commits

Author SHA1 Message Date
Yingchi Long
70deb7bfe9
[BPF] expand cttz, ctlz for i32, i64 (#73668)
Fixes: https://github.com/llvm/llvm-project/issues/62252

Depends on: #73667
2024-04-01 10:57:54 +08:00
Sergei Barannikov
5e5b656102
[MC] Make MCParsedAsmOperand::getReg() return MCRegister (#86444) 2024-03-25 05:13:48 +03:00
paperchalice
635ea257ec
[NewPM] Fix BPF build (#86379)
Add Passes in dependency list
2024-03-23 13:06:58 +08:00
paperchalice
2aa5bae0c0
[NewPM][BPF] Add BPFPassRegistry.def NFCI (#86241)
Prepare migration for dag-isel.
2024-03-23 12:53:26 +08:00
Jeremy Morse
b9d83eff25
[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736)
These are the last remaining "trivial" changes to passes that use
Instruction pointers for insertion. All of this should be NFC, it's just
changing the spelling of how we identify a position.

In one or two locations, I'm also switching uses of getNextNode etc to
using std::next with iterators. This too should be NFC.

---------

Merged by: Stephen Tozer <stephen.tozer@sony.com>
2024-03-19 16:36:29 +00:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
yonghong-song
0e0bfacff7
[BPF] Add support for may_goto insn (#85358)
Alexei added may_goto insn in [1]. The asm syntax for may_goto looks
like
  may_goto <label>

The instruction represents a conditional branch but the condition is
implicit. Later in bpf kernel verifier, the 'may_goto <label>' insn will
be rewritten with an explicit condition. The encoding of 'may_goto' insn
is enforced in [2] and is also implemented in this patch.

In [3], 'may_goto' insn is encoded with raw bytes. I made the following
change
```
  --- a/tools/testing/selftests/bpf/bpf_experimental.h
  +++ b/tools/testing/selftests/bpf/bpf_experimental.h
  @@ -328,10 +328,7 @@ l_true:                                                                                            \

   #define cond_break                                     \
          ({ __label__ l_break, l_continue;               \
  -        asm volatile goto("1:.byte 0xe5;                       \
  -                     .byte 0;                          \
  -                     .long ((%l[l_break] - 1b - 8) / 8) & 0xffff;      \
  -                     .short 0"                         \
  +        asm volatile goto("may_goto %l[l_break]"       \
                        :::: l_break);                    \
          goto l_continue;                                \
          l_break: break;
```
and ran the selftest with the latest llvm with this patch. All tests are
passed.

[1]
https://lore.kernel.org/bpf/20240306031929.42666-1-alexei.starovoitov@gmail.com/
[2]
https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com/
[3]
https://lore.kernel.org/bpf/20240306031929.42666-4-alexei.starovoitov@gmail.com/
2024-03-15 07:24:28 -07:00
eddyz87
65b123e287
[BPF] rename 'arena' to 'address_space' (#85161)
There are a few places where `arena` name is used for pointers in
non-zero address space in BPF backend, rename these to use a more
generic `address_space`:
- macro `__BPF_FEATURE_ARENA_CAST` -> `__BPF_FEATURE_ADDR_SPACE_CAST
- name for arena global variables section `.arena.N` ->
`.addr_space.N`
2024-03-14 19:20:06 -07:00
4ast
2aacb56e83
BPF address space insn (#84410)
This commit aims to support BPF arena kernel side
[feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/):
- arena is a memory region accessible from both BPF program and
userspace;
- base pointers for this memory region differ between kernel and user
spaces;
- `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)`
translates src_reg, a pointer in src_addr_space to dst_reg, equivalent
pointer in dst_addr_space, {src,dst}_addr_space are immediate constants;
- number 0 is assigned to kernel address space;
- number 1 is assigned to user address space.

On the LLVM side, the goal is to make load and store operations on arena
pointers "transparent" for BPF programs:
- assume that pointers with non-zero address space are pointers to
  arena memory;
- assume that arena is identified by address space number;
- assume that address space zero corresponds to kernel address space;
- assume that every BPF-side load or store from arena is done via
pointer in user address space, thus convert base pointers using
`addr_space_cast(src_reg, 0, 1)`;

Only load, store, cmpxchg and atomicrmw IR instructions are handled by
this transformation.

For example, the following C code:

```c
   #define __as __attribute__((address_space(1)))
   void copy(int __as *from, int __as *to) { *to = *from; }
```

Compiled to the following IR:

```llvm
    define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) {
    entry:
      %0 = load i32, ptr addrspace(1) %from, align 4
      store i32 %0, ptr addrspace(1) %to, align 4
      ret void
    }
```

Is transformed to:

```llvm
    %to2 = addrspacecast ptr addrspace(1) %to to ptr     ;; !
    %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; !
    %0 = load i32, ptr %from1, align 4, !tbaa !3
    store i32 %0, ptr %to2, align 4, !tbaa !3
    ret void
```

And compiled as:

```asm
    r2 = addr_space_cast(r2, 0, 1)
    r1 = addr_space_cast(r1, 0, 1)
    r1 = *(u32 *)(r1 + 0)
    *(u32 *)(r2 + 0) = r1
    exit
```

Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>
2024-03-13 02:27:25 +02:00
Yingchi Long
cf922e51b8
[BPF] lowering target address leaf nodes tconstpool (#73667)
Adds custom lowering for tconstpool.

Please ref: https://github.com/llvm/llvm-project/pull/73668 for test
coverage
2024-03-06 19:47:44 +08:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
yonghong-song
c43ad6c0fd
BPF: Change callx insn encoding (#81546)
Currently, the kernel verifier unsupported callx insn used the 32-bit
imm field to store the target register. On the other hand, gcc used the
dst_reg field to store the target register. The gcc encoding is better.
This patch adjusted the coding to be the same as gcc.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
2024-02-12 20:08:01 -08:00
James Y Knight
b856e77b2d
Set MaxAtomicSizeInBitsSupported for remaining targets. (#75703)
Targets affected:

- NVPTX and BPF: set to 64 bits.
- ARC, Lanai, and MSP430: set to 0 (they don't implement atomics).

Those which didn't yet add AtomicExpandPass to their pass pipeline now
do so.

This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass. On all these targets, this
now matches what Clang already does in the frontend.

The only targets which do not configure AtomicExpandPass now are:
- DirectX and SPIRV: they aren't normal backends.
- AVR: a single-cpu architecture with no privileged/user divide, which
could implement all atomics by disabling/enabling interrupts, regardless
of size/alignment. Will be addressed by future work.
2024-01-08 22:34:28 -05:00
paperchalice
ffb1f20e0d
[CodeGen] Add flag to populate target pass names (#76328)
`print-pipeline-passes` can show target pass names.
2024-01-03 09:07:02 +08:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00
Yingchi Long
ddf85b92aa
[BPF] improve error handling by custom lowering & fail() (#75088)
Currently on mcpu=v3 we do not support sdiv, srem instructions. And the
backend crashes with stacktrace & coredump, which is misleading for end
users, as this is not a "bug"

Add llvm bug reporting for sdiv/srem on ISel legalize-op phase.

For clang frontend we can get detailed location & bug report.

    $ build/bin/clang -g -target bpf -c local/sdiv.c
local/sdiv.c:1:35: error: unsupported signed division, please convert to
unsigned div/mod.
        1 | int sdiv(int a, int b) { return a / b; }
          |                                   ^
    1 error generated.

Fixes: #70433
Fixes: #48647

This also improves error handling for dynamic stack allocation:

    local/vla.c:2:3: error: unsupported dynamic stack allocation
        2 |   int b[n];
          |   ^
    1 error generated.

Fixes: https://github.com/llvm/llvm-project/issues/57171
2023-12-13 13:41:52 +08:00
Yingchi Long
c4ac1d239f
[BPF][GlobalISel] select non-PreISelGenericOpcode (#75034)
This selects non-PreISelGenericOpcode as-is.

Depends on: #74999

Co-authored-by: Origami404 <Origami404@foxmail.com>
2023-12-12 16:19:34 +08:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Yingchi Long
2460bf2fac
[BPF][GlobalISel] add initial gisel support for BPF (#74999)
This adds initial codegen support for BPF backend.

Only implemented ir-translator for "RET" (but not support isel).

Depends on: #74998
2023-12-11 19:58:34 +08:00
Yingchi Long
75193b192a
[BPF] use target triple for pattern predicates (#74998)
This is used for eliminate uses of "CurDAG", which is SelectionDAG-spec,
and not compatible with GIsel algorithms.

(NFC)
2023-12-11 17:10:39 +08:00
yonghong-song
32d535195e
BPF: Emit an error for illegal LD_imm64 insn when LLVM_ENABLE_ASSERTI… (#74035)
…ONS=OFF

Jose reported an issue ([1]) where for the below illegal asm code
```
  r0 = 1 + w3 ll
```
clang actually supports it and generates the object code.

Further investigation finds that clang actually intends to reject the
above code as well but only when the clang is built with
LLVM_ENABLE_ASSERTIONS=ON.

I later found that clang16 (built by redhat and centos) in fedora system
has the same issue since they also have LLVM_ENABLE_ASSERTIONS=OFF
([2]).

So let BPF backend report an error for the above case regardless of the
LLVM_ENABLE_ASSERTIONS setting.

  [1] https://lore.kernel.org/bpf/87leahx2xh.fsf@oracle.com/#t
[2]
https://lore.kernel.org/bpf/840e33ec-ea4c-4b55-bda1-0be8d1e0324f@linux.dev/

Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
2023-12-07 11:29:40 -08:00
Eduard Zingerman
030b8cb156 [BPF] Attribute preserve_static_offset for structs
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
  updating the field offset;

  BPF verifier limits access patterns allowed for certain data
  types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
  types only `LD/ST <reg> <static-offset>` memory loads and stores are
  allowed.

  This is so because offsets of the fields of these structures do not
  match real offsets in the running kernel. During BPF program
  load/verification loads and stores to the fields of these types are
  rewritten so that offsets match real offsets. For this rewrite to
  happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
  kernel source tree for details.

- runtime environment might disallow access to the field of the type
  through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for
  register values. In case if register with such tag is modified BPF
  programs are not allowed to read or write memory using register. See
  kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
  source tree for details.

Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

This is a second attempt to commit this change, previous reverted
commit is: cb13e9286b6d4e384b5d4203e853d44e2eff0f0f.
The following items had been fixed:
- test case bpf-preserve-static-offset-bitfield.c now uses
  `-triple bpfel` to avoid different codegen for little/big endian
  targets.
- BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid
  use after free for `WorkList` elements `V`.

Differential Revision: https://reviews.llvm.org/D133361
2023-12-05 19:21:42 +02:00
Eduard Zingerman
2484469803 Revert "[BPF] Attribute preserve_static_offset for structs"
This reverts commit cb13e9286b6d4e384b5d4203e853d44e2eff0f0f.
Buildbot reports MSAN failures in tests added in this commit:
https://lab.llvm.org/buildbot/#/builders/5/builds/38806

Failing tests:
  LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll
2023-11-30 22:29:45 +02:00
Eduard Zingerman
cb13e9286b [BPF] Attribute preserve_static_offset for structs
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
  updating the field offset;

  BPF verifier limits access patterns allowed for certain data
  types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
  types only `LD/ST <reg> <static-offset>` memory loads and stores are
  allowed.

  This is so because offsets of the fields of these structures do not
  match real offsets in the running kernel. During BPF program
  load/verification loads and stores to the fields of these types are
  rewritten so that offsets match real offsets. For this rewrite to
  happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
  kernel source tree for details.

- runtime environment might disallow access to the field of the type
  through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for
  register values. In case if register with such tag is modified BPF
  programs are not allowed to read or write memory using register. See
  kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
  source tree for details.

Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

Differential Revision: https://reviews.llvm.org/D133361
2023-11-30 19:45:03 +02:00
yonghong-song
e247e6ff27
[BPF] Add asm support for JSET insn (#73161)
BPF upstream reported that JSET insn is not supported in inline asm
([1]). BPF_JSET insn is part of BPF ISA so let us add asm support for it
now.

[1]
https://lore.kernel.org/bpf/2e8a1584-a289-4b2e-800c-8b463e734bcb@linux.dev/
2023-11-27 18:43:24 -08:00
Nikita Popov
7eeedc124f [CodeGen] Make some includes explicit (NFC)
Explicitly include some headers or forward-declare types, in
preparation for removing an include that pulls in many transitive
headers.
2023-11-24 14:43:18 +01:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
yonghong-song
32e35b21b5
[BPF] Skip modifiers for __builtin_btf_type_id() local type (#71094)
BPF upstream reported an inconsistent behavior w.r.t. BPF_TYPE_ID_LOCAL
vs. BPF_TYPE_ID_TARGET (or BPF_TYPE_ID_REMOTE in LLVM terminology).

For BPF_TYPE_ID_TARGET, all modifiers (like 'const' and 'volatile') are
ignored in the final type encoding. For example, for type
 'const struct foo', the eventually encoding in BTF relocation
is 'struct foo'. This faciliates libbpf to match corresponding kernel
types with considering any modifiers.

Currently behavior for BPF_TYPE_ID_LOCAL is different. It will encode
'const struct foo' in BTF relocation and such discrepancy confused users
([1]).

This patch fixed this discrepancy by making BPF_TYPE_ID_LOCAL BTF type
representation the sams as BPF_TYPE_ID_TARGET. This should have minimum
user impact since ultimately user wants to get a real time not a 'const'
type modifier.

The selftest builtin-btf-type-id-2.ll is used to test BPF_TYPE_ID_TARGET
with 'const' modifier. Adapt the same test for BPF_TYPE_ID_LOCAL. And
the below diff shows now both BPF_TYPE_ID_LOCAL and BPF_TYPE_ID_TARGET
produces the same type:

$ diff test/CodeGen/BPF/BTF/builtin-btf-type-id-2.ll
test/CodeGen/BPF/BTF/builtin-btf-type-id-local.ll
--- test/CodeGen/BPF/BTF/builtin-btf-type-id-2.ll 2023-07-30
16:58:20.657528310 -0700
+++ test/CodeGen/BPF/BTF/builtin-btf-type-id-local.ll 2023-11-02
10:23:25.356959008 -0700
  @@ -6,7 +6,7 @@
   ;     int a;
   ;   };
   ;   int test(void) {
  -;     return __builtin_btf_type_id(*(const struct s *)0, 1);
  +;     return __builtin_btf_type_id(*(const struct s *)0, 0);
   ;   }
   ; Compilation flag:
; clang -target bpf -O2 -g -S -emit-llvm -Xclang -disable-llvm-passes
test.c
  $

[1]
https://lore.kernel.org/bpf/CAN+4W8h3yDjkOLJPiuKVKTpj_08pBz8ke6vN=Lf8gcA=iYBM-g@mail.gmail.com/

Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
2023-11-03 12:52:16 -07:00
Kazu Hirata
4a0ccfa865 Use llvm::endianness::{big,little,native} (NFC)
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
2023-10-12 21:21:45 -07:00
Kazu Hirata
a9d5056862 Use llvm::endianness (NFC)
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form.  This patch replaces
support::endianness with llvm::endianness.
2023-10-10 21:54:15 -07:00
Eduard Zingerman
f22442553b [BPF] Check jump and memory offsets to avoid truncation
The following assembly code should issue two errors specifying that
both jump and load offsets are out of range:

  if r1 > r2 goto +100500
  r1 = *(u64 *)(r1 - 100500)

This commit updates BPFAsmParser to check that:
- offset specified for jump is either identifier (label) or a 16-bit
  signed constant;
- offset specified for memory operations is a signed 16-bit constant.

(Which matches expectations in the BPFELFObjectWriter and
 BPFMCCodeEmitter).

Differential Revision: https://reviews.llvm.org/D158425
2023-09-23 19:39:24 +03:00
Eduard Zingerman
d15f96fe4b [BPF][DebugInfo] Show CO-RE relocations in llvm-objdump
Extend llvm-objdump to show CO-RE relocations when `-r` option is
passed and object file has .BTF and .BTF.ext sections.

For example, the following C program:

    #define __pai __attribute__((preserve_access_index))

    struct foo { int i; int j;} __pai;
    struct bar { struct foo f[7]; } __pai;
    extern void sink(void *);

    void root(struct bar *bar) {
      sink(&bar[2].f[3].j);
    }

Should lead to the following objdump output:

    $ clang --target=bpf -O2 -g t.c -c -o - | \
        llvm-objdump  --no-addresses --no-show-raw-insn -dr -

    ...
            r2 = 0x94
                    CO-RE <byte_off> [2] struct bar::[2].f[3].j (2:0:3:1)
            r1 += r2
            call -0x1
                    R_BPF_64_32     sink
            exit
    ...

More examples could be found in unit tests, see BTFParserTest.cpp.

To achieve this:
- Move CO-RE relocation kinds definitions from BPFCORE.h to BTF.h.
- Extend BTF.h with types derived from BTF::CommonType, e.g.
  BTF::IntType and BTF::StrutType, to allow dyn_cast() and access to
  type additional data.
- Extend BTFParser to load BTF type and relocation data.
- Modify llvm-objdump.cpp to create instance of BTFParser when
  disassembly of object file with BTF sections is processed and `-r`
  flag is supplied.

Additional information about CO-RE is available at [1].

[1] https://docs.kernel.org/bpf/llvm_reloc.html

Depends on D149058

Differential Revision: https://reviews.llvm.org/D150079
2023-09-21 21:59:10 +03:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Nick Desaulniers
86735a4353
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66264)
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)

This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.

Fix up build failures in targets I missed in #66003

Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.

- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
2023-09-13 13:31:24 -07:00
Sergei Barannikov
a479be0f39 [MC] Change tryParseRegister to return ParseStatus (NFC)
This finishes the work of replacing OperandMatchResultTy with
ParseStatus, started in D154101.
As a drive-by change, rename some RegNo variables to just Reg
(a leftover from the days when RegNo had 'unsigned' type).
2023-09-06 10:28:12 +03:00
Eduard Zingerman
651e644595 [BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree()
Replace `BPFMIPeepholeTruncElim` by adding an overload for
`TargetLowering::isZExtFree()` aware that zero extension is
free for `ISD::LOAD`.

Short description
=================

The `BPFMIPeepholeTruncElim` handles two patterns:

Pattern #1:

    %1 = LDB %0, ...              %1 = LDB %0, ...
    %2 = AND_ri %1, 0xff      ->  %2 = MOV_ri %1    <-- (!)

Pattern #2:

    bb.1:                         bb.1:
      %a = LDB %0, ...              %a = LDB %0, ...
      br %bb3                       br %bb3
    bb.2:                         bb.2:
      %b = LDB %0, ...        ->    %b = LDB %0, ...
      br %bb3                       br %bb3
    bb.3:                         bb.3:
      %1 = PHI %a, %b               %1 = PHI %a, %b
      %2 = AND_ri %1, 0xff          %2 = MOV_ri %1  <-- (!)

Plus variations:
- AND_ri_32 instead of AND_ri
- SLL/SLR instead of AND_ri
- LDH, LDW, LDB32, LDH32, LDW32

Both patterns could be handled by built-in transformations at
instruction selection phase if suitable `isZExtFree()` implementation
is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`.

When evaluating on BPF kernel selftests and remove_truncate_*.ll LLVM
test cases this revisions performs slightly better than
BPFMIPeepholeTruncElim, see "Impact" section below for details.

Commit also adds a few test cases to make sure that patterns in
question are handled.

Long description
================

Why this works: Pattern #1
--------------------------

Consider the following example:

    define i1 @foo(ptr %p) {
    entry:
      %a = load i8, ptr %p, align 1
      %cond = icmp eq i8 %a, 0
      ret i1 %cond
    }

Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command:

    ...
    Type-legalized selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64
          t19: i64 = and t16, Constant:i64<255>
        t17: i64 = setcc t19, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Replacing.1 t19: i64 = and t16, Constant:i64<255>
    With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64
     and 0 other values
    ...
    Optimized type-legalized selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 11 nodes:
      t0: ch,glue = EntryToken
            t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
        t17: i64 = setcc t20, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...

Note:
- Optimized type-legalized selection DAG:
  - `t19 = and t16, 255` had been replaced by `t16` (load).
  - Patterns like `(and (load ... i8), 255)` are replaced by `load`
    in `DAGCombiner::BackwardsPropagateMask` called from
    `DAGCombiner::visitAND`.
  - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by
    `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge,
    look for `TLI.shouldFoldConstantShiftPairToMask()` call).

Why this works: Pattern #2
--------------------------

Consider the following example:

    define i1 @foo(ptr %p) {
    entry:
      %a = load i8, ptr %p, align 1
      br label %next

    next:
      %cond = icmp eq i8 %a, 0
      ret i1 %cond
    }

Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command.
Log for first basic block:

    Initial selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 9 nodes:
      t0: ch,glue = EntryToken
      t3: i64 = Constant<0>
            t2: i64,ch = CopyFromReg t0, Register:i64 %1
          t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64
        t6: i64 = zero_extend t5
      t8: ch = CopyToReg t0, Register:i64 %0, t6
    ...
    Replacing.1 t6: i64 = zero_extend t5
    With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
     and 0 other values
    ...
    Optimized lowered selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 7 nodes:
      t0: ch,glue = EntryToken
          t2: i64,ch = CopyFromReg t0, Register:i64 %1
        t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
      t8: ch = CopyToReg t0, Register:i64 %0, t9

Note:
- Initial selection DAG:
  - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))`
    w/o special `isZExtFree()` overload added by this commit
    it is instead lowered as `t6 = (any_extend (load ...))`.
  - The decision to generate `zero_extend` or `any_extend` is
    done in `RegsForValue::getCopyToRegs` called from
    `SelectionDAGBuilder::CopyValueToVirtualRegister`:
    - if `isZExtFree()` for load returns true `zero_extend` is used;
    - `any_extend` is used otherwise.
- Optimized lowered selection DAG:
  - `t6 = (any_extend (load ...))` is replaced by
    `t9 = load ..., zext from i8`
    This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from
    `DAGCombiner::visitZERO_EXTEND`.

Log for second basic block:

    Initial selection DAG: %bb.1 'foo:next'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
                t2: i64,ch = CopyFromReg t0, Register:i64 %0
              t4: i64 = AssertZext t2, ValueType:ch:i8
            t5: i8 = truncate t4
          t8: i1 = setcc t5, Constant:i8<0>, seteq:ch
        t9: i64 = any_extend t8
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Replacing.2 t18: i64 = and t4, Constant:i64<255>
    With: t4: i64 = AssertZext t2, ValueType:ch:i8
    ...
    Type-legalized selection DAG: %bb.1 'foo:next'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t4: i64 = AssertZext t2, ValueType:ch:i8
          t18: i64 = and t4, Constant:i64<255>
        t16: i64 = setcc t18, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Optimized type-legalized selection DAG: %bb.1 'foo:next'
    SelectionDAG has 11 nodes:
      t0: ch,glue = EntryToken
            t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t4: i64 = AssertZext t2, ValueType:ch:i8
        t16: i64 = setcc t4, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...

Note:
- Initial selection DAG:
  - `t0` is an input value for this basic block, it corresponds load
    instruction (`t9`) from the first basic block.
  - It is accessed within basic block via
    `t4` (AssertZext (CopyFromReg t0, ...)).
  - The `AssertZext` is generated by RegsForValue::getCopyFromRegs
    called from SelectionDAGBuilder::getCopyFromRegs, it is generated
    only when `LiveOutInfo` with known number of leading zeros is
    present for `t0`.
  - Known register bits in `LiveOutInfo` are computed by
    `SelectionDAG::computeKnownBits` called from
    `SelectionDAGISel::ComputeLiveOutVRegInfo`.
  - `computeKnownBits()` generates leading zeros information for
    `(load ..., zext from ...)` but *does not* generate leading zeros
    information for `(load ..., anyext from ...)`.
    This is why `isZExtFree()` added in this commit is important.
- Type-legalized selection DAG:
  - `t5 = truncate t4` is replaced by `t18 = and t4, 255`
- Optimized type-legalized selection DAG:
  - `t18 = and t4, 255` is replaced by `t4`, this is done by
    `DAGCombiner::SimplifyDemandedBits` called from
    `DAGCombiner::visitAND`, which simplifies patterns like
    `(and (assertzext ...))`

Impact
------

This change covers all remove_truncate_*.ll test cases:
- for -mcpu=v4 there are no changes in the generated code;
- for -mcpu=v2 code generated for remove_truncate_7 and
  remove_truncate_8 improved slightly, for other tests it is
  unchanged.

For remove_truncate_7:

    Before this revision                 After this revision
    --------------------                 -------------------
        r1 <<= 0x20                          r1 <<= 0x20
        r1 >>= 0x20                          r1 >>= 0x20
        if r1 == 0x0 goto +0x2 <LBB0_2>      if r1 == 0x0 goto +0x2 <LBB0_2>
        r1 = *(u32 *)(r2 + 0x0)              r0 = *(u32 *)(r2 + 0x0)
        goto +0x1 <LBB0_3>                   goto +0x1 <LBB0_3>
    <LBB0_2>:                            <LBB0_2>:
        r1 = *(u32 *)(r2 + 0x4)              r0 = *(u32 *)(r2 + 0x4)
    <LBB0_3>:                            <LBB0_3>:
        r0 = r1                              exit
        exit

For remove_truncate_8:

    Before this revision                 After this revision
    --------------------                 -------------------
        r2 = *(u32 *)(r1 + 0x0)              r2 = *(u32 *)(r1 + 0x0)
        r3 = r2                              r3 = r2
        r3 <<= 0x20                          r3 <<= 0x20
        r4 = r3                              r3 s>>= 0x20
        r4 s>>= 0x20
        if r4 s> 0x2 goto +0x5 <LBB0_3>      if r3 s> 0x2 goto +0x4 <LBB0_3>
        r4 = *(u32 *)(r1 + 0x4)              r3 = *(u32 *)(r1 + 0x4)
        r3 >>= 0x20
        if r3 >= r4 goto +0x2 <LBB0_3>       if r2 >= r3 goto +0x2 <LBB0_3>
        r2 += 0x2                            r2 += 0x2
        *(u32 *)(r1 + 0x0) = r2              *(u32 *)(r1 + 0x0) = r2
    <LBB0_3>:                            <LBB0_3>:
        r0 = 0x3                             r0 = 0x3
        exit                                 exit

For kernel BPF selftests statistics is as follows: (-mcpu=v4):
- For -mcpu=v4: 9 out of 655 object files have differences,
  in all cases total number of instructions marginally decreased
  (-27 instructions).
- For -mcpu=v2: 9 out of 655 object files have differences:
  - For 19 object files number of instruction decreased
    (-129 instruction in total): some redundant `rX &= 0xffff`
    and register to register assignments removed;
  - For 2 object files number of instructions increased +2
    instructions in each file.

Both -mcpu=v2 instruction increases could be reduced to the same
example:

    define void @foo(ptr %p) {
    entry:
      %a = load i32, ptr %p, align 4
      %b = sext i32 %a to i64
      %c = icmp ult i64 1, %b
      br i1 %c, label %next, label %end

    next:
      call void inttoptr (i64 62 to ptr)(i32 %a)
      br label %end

    end:
      ret void
    }

Note that this example uses value loaded to `%a` both as a sign
extended (`%b`) and as zero extended (`%a` passed as parameter).
Here is the difference in final assembly code:

    Before this revision          After this revision
    --------------------          -------------------
        r1 = *(u32 *)(r1 + 0)         r1 = *(u32 *)(r1 + 0)
        r1 <<= 32                     r1 <<= 32
        r1 s>>= 32                    r1 s>>= 32
        if r1 < 2 goto <LBB0_2>       if r1 < 2 goto <LBB0_2>
                                      r1 <<= 32
                                      r1 >>= 32
        call 62                       call 62
    <LBB0_2>:                     <LBB0_2>:
        exit                          exit

Before this commit `%a` is passed to call as a sign extended value,
after this commit `%a` is passed to call as a zero extended value,
both are correct as 32-bit sub-register is the same.

The difference comes from `DAGCombiner` operation on the initial DAG:

Initial selection DAG before this commit:

    t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
          t6: i64 = any_extend t5         <--------------------- (1)
        t8: ch = CopyToReg t0, Register:i64 %0, t6
            t9: i64 = sign_extend t5
          t12: i1 = setcc Constant:i64<1>, t9, setult:ch

Initial selection DAG after this commit:

    t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
          t6: i64 = zero_extend t5        <--------------------- (2)
        t8: ch = CopyToReg t0, Register:i64 %0, t6
            t9: i64 = sign_extend t5
          t12: i1 = setcc Constant:i64<1>, t9, setult:ch

The node `t9` is processed before node `t6` and `load` instruction is
combined to load with sign extension:

    Replacing.1 t9: i64 = sign_extend t5
    With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64
     and 0 other values
    Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
    With: t31: i32 = truncate t30
     and 1 other values

This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from
`DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which
is `any_extend` in (1) and `zero_extend` in (2).
`tryToFoldExtOfLoad()` rewrites such uses of `t5` differently:
- `any_extend` is simply removed
- `zero_extend` is replaced by `and t30, 0xffffffff`, which is later
  converted to a pair of shifts. This pair of shifts survives till the
  end of translation.

Differential Revision: https://reviews.llvm.org/D157870
2023-08-22 00:04:51 +03:00
Sergei Barannikov
0e79111e4d [AVR][BPF][Lanai][Xtensa] Replace OperandMatchResultTy with ParseStatus (NFC)
ParseStatus is slightly more convenient to use due to implicit
conversion from bool, which allows to do something like:
```
  return Error(L, "msg");
```
when with MatchOperandResultTy it had to be:
```
  Error(L, "msg");
  return MatchOperand_ParseFail;
```
It also has more appropriate name since parse* methods are not only for
parsing operands.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D158275
2023-08-20 14:20:28 +03:00
Eduard Zingerman
8f28e8069c [BPF] support for BPF_ST instruction in codegen
Generate store immediate instruction when CPUv4 is enabled.
For example:

    $ cat test.c
    struct foo {
      unsigned char  b;
      unsigned short h;
      unsigned int   w;
      unsigned long  d;
    };
    void bar(volatile struct foo *p) {
      p->b = 1;
      p->h = 2;
      p->w = 3;
      p->d = 4;
    }

    $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
    ...
    0000000000000000 <bar>:
           0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
           1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
           2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
           3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
           4:	95 00 00 00 00 00 00 00	exit

Take special care to:
- apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST
- validate immediate value when BPF_ST write is 64-bit:
  BPF interprets `(BPF_ST | BPF_MEM | BPF_DW)` writes as writes with
  sign extension. Thus it is fine to generate such write when
  immediate is -1, but it is incorrect to generate such write when
  immediate is +0xffff_ffff.

This commit was previously reverted in e66affa17e32.
The reason for revert was an unrelated bug in BPF backend,
triggered by test case added in this commit if LLVM is built
with LLVM_ENABLE_EXPENSIVE_CHECKS.
The bug was fixed in D157806.

Differential Revision: https://reviews.llvm.org/D140804
2023-08-16 17:51:28 +03:00
Eduard Zingerman
08d92dedd2 [BPF] Fix in/out argument constraints for CORE_MEM instructions
When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the
following C code snippet:

    struct t {
      int a;
    } __attribute__((preserve_access_index));

    void test(struct t *t) {
      t->a = 42;
    }

Causes an assertion:

$ clang -g -O2 -c --target=bpf -mcpu=v2 t.c -o /dev/null

Function Live Ins: $r1 in %0

bb.0.entry:
  liveins: $r1
  DBG_VALUE $r1, $noreg, !"t", ...
  %0:gpr = COPY $r1
  DBG_VALUE %0:gpr, $noreg, !"t", ...
  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
  %4:gpr = MOV_ri 42
  CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...
  RET debug-location !25; t.c:7:1

*** Bad machine code: Explicit definition marked as use ***
- function:    test
- basic block: %bb.0 entry (0x6210000d8a90)
- instruction: CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...
- operand 0:   killed %4:gpr

This happens because `CORE_MEM` instruction is defined to have output
operands:

  def CORE_MEM : TYPE_LD_ST<BPF_MEM.Value, BPF_W.Value,
                            (outs GPR:$dst),
                            (ins u64imm:$opcode, GPR:$src, u64imm:$offset),
                            "$dst = core_mem($opcode, $src, $offset)",
                            []>;

As documented in [1]:

> By convention, the LLVM code generator orders instruction operands
> so that all register definitions come before the register uses, even
> on architectures that are normally printed in other orders.

In other words, the first argument for `CORE_MEM` is considered to be
a "def", while in reality it is "use":

  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
  %4:gpr = MOV_ri 42
   '---------------.
                   v
  CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...

Here is how `CORE_MEM` is constructed in
`BPFMISimplifyPatchable::checkADDrr()`:

    BuildMI(*DefInst->getParent(), *DefInst, DefInst->getDebugLoc(), TII->get(COREOp))
        .add(DefInst->getOperand(0)).addImm(Opcode).add(*BaseOp)
        .addGlobalAddress(GVal);

Note that first operand is constructed as `.add(DefInst->getOperand(0))`.

For `LD{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a
destination register of a load, so instruction is constructed in
accordance with `outs` declaration.

For `ST{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a
source register of a store (value to be stored), so instruction
violates the `outs` declaration.

This commit fixes the issue by splitting `CORE_MEM` in three
instructions: `CORE_ST`, `CORE_LD64`, `CORE_LD32` with correct `outs`
specifications.

[1] https://llvm.org/docs/CodeGenerator.html#the-machineinstr-class

Differential Revision: https://reviews.llvm.org/D157806
2023-08-15 02:34:21 +03:00
Eduard Zingerman
27026fe563 [BPF] Reset machine register kill mark in BPFMISimplifyPatchable
When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option
the following C code snippet:

    struct t {
      unsigned long a;
    } __attribute__((preserve_access_index));

    void foo(volatile struct t *t, volatile unsigned long *p) {
      *p = t->a;
      *p = t->a;
    }

Causes an assertion:

    $ clang -g -O2 -c --target=bpf -mcpu=v2 t2.c -o /dev/null

    # After BPF PreEmit SimplifyPatchable
    # Machine code for function foo: IsSSA, TracksLiveness
    Function Live Ins: $r1 in %0, $r2 in %1

    bb.0.entry:
      liveins: $r1, $r2
      DBG_VALUE $r1, $noreg, !"t", !DIExpression()
      DBG_VALUE $r2, $noreg, !"p", !DIExpression()
      %1:gpr = COPY $r2
      DBG_VALUE %1:gpr, $noreg, !"p", !DIExpression()
      %0:gpr = COPY $r1
      DBG_VALUE %0:gpr, $noreg, !"t", !DIExpression()
      %2:gpr = LD_imm64 @"llvm.t:0:0$0:0"
      %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
      %5:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0"
      STD killed %5:gpr, %1:gpr, 0
      %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
      %8:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0"
      STD killed %8:gpr, %1:gpr, 0
      RET

    # End machine code for function foo.

    *** Bad machine code: Using a killed virtual register ***
    - function:    foo
    - basic block: %bb.0 entry (0x6210000e6690)
    - instruction: %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
    - operand 2:   killed %2:gpr

This happens because of the way
BPFMISimplifyPatchable::processDstReg() updates second operand of the
`ADD_rr` instruction. Code before `BPFMISimplifyPatchable`:

    .-> %2:gpr = LD_imm64 @"llvm.t:0:0$0:0"
    |
    |`----------------.
    |   %3:gpr = LDD %2:gpr, 0
    |   %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %3:gpr <--- (1)
    |   %5:gpr = LDD killed %4:gpr, 0       ^^^^^^^^^^^^^
    |   STD killed %5:gpr, %1:gpr, 0        this is updated
     `----------------.
        %6:gpr = LDD %2:gpr, 0
        %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %6:gpr <--- (2)
        %8:gpr = LDD killed %7:gpr, 0       ^^^^^^^^^^^^^
        STD killed %8:gpr, %1:gpr, 0        this is updated

Instructions (1) and (2) would be updated to:

    ADD_rr %0:gpr(tied-def 0), killed %2:gpr

The `killed` mark is inherited from machine operands `killed %3:gpr`
and `killed %6:gpr` which are updated inplace by `processDstReg()`.

This commit updates `processDstReg()` reset kill marks for updated
machine operands to keep liveness information conservatively correct.

Differential Revision: https://reviews.llvm.org/D157805
2023-08-15 02:23:38 +03:00
Eduard Zingerman
e66affa17e Revert "[BPF] support for BPF_ST instruction in codegen"
This reverts commit 92e28e397d4ccf1bff075f48e22cf1e23a7d02bf.

Reverting to investigate buildbot failure reported in [1].

    field-reloc-st-imm.ll:
    *** Bad machine code: Explicit definition must be a register ***
    - function:    bar
    - basic block: %bb.0 entry (0x742f318)
    - instruction: CORE_MEM 3, 416, %0:gpr, @"llvm.foo:0:4$0:2", ...
    - operand 0:   3
    *** Bad machine code: Explicit definition must be a register ***
    - function:    bar
    - basic block: %bb.0 entry (0x742f318)
    - instruction: CORE_MEM 4, 410, %0:gpr, @"llvm.foo:0:8$0:3", ...
    - operand 0:   4
    LLVM ERROR: Found 4 machine code errors.

[1] https://lab.llvm.org/buildbot/#/builders/16/builds/52877
2023-08-11 02:23:40 +03:00
Eduard Zingerman
92e28e397d [BPF] support for BPF_ST instruction in codegen
Generate store immediate instruction when CPUv4 is enabled.
For example:

    $ cat test.c
    struct foo {
      unsigned char  b;
      unsigned short h;
      unsigned int   w;
      unsigned long  d;
    };
    void bar(volatile struct foo *p) {
      p->b = 1;
      p->h = 2;
      p->w = 3;
      p->d = 4;
    }

    $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
    ...
    0000000000000000 <bar>:
           0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
           1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
           2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
           3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
           4:	95 00 00 00 00 00 00 00	exit

Take special care to:
- apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST
- validate immediate value when BPF_ST write is 64-bit:
  BPF interprets `(BPF_ST | BPF_MEM | BPF_DW)` writes as writes with
  sign extension. Thus it is fine to generate such write when
  immediate is -1, but it is incorrect to generate such write when
  immediate is +0xffff_ffff.

Differential Revision: https://reviews.llvm.org/D140804
2023-08-11 02:07:29 +03:00
Tamir Duberstein
055893beac
[BPF] Don't crash on missing line info
When compiling Rust code we may end up with calls to functions provided
by other code units. Presently this code crashes on a null pointer
dereference - this patch avoids that crash and adds a test.

Reviewed By: ast

Differential Revision: https://reviews.llvm.org/D156446
2023-08-03 09:18:12 -04:00
Tamir Duberstein
6b5e486428
Revert "[BPF] Narrow some interfaces"
Review requested by @ast and @yonghong-song.

This reverts commit 82bc1839bc53db296be36947ec0c482c8ca3a3d8.
2023-08-02 14:50:01 -04:00
Tamir Duberstein
82bc1839bc
[BPF] Narrow some interfaces
When compiling Rust code we sometimes see incomplete debug info leading
to crashes. Narrow the interfaces so we can see where it happens.

Reviewed By: ajwerner

Differential Revision: https://reviews.llvm.org/D156443
2023-08-02 14:08:30 -04:00
Tamir Duberstein
f2bd78415f
[BPF] Avoid repeating MI->getOperand(NumDefs) x3
Differential Revision: https://reviews.llvm.org/D156445
2023-08-02 10:56:01 -04:00
Tamir Duberstein
d542a56c1c [BPF] Clean up SelLowering
This patch contains a number of uncontroversial changes:
- Replace all uses of
  `errs`, `assert`, `llvm_unreachable` with `report_fatal_error` with
  informative error strings.
- Replace calls to `fail` in loops with at most one call per error
  instance. Previously a function with 19 arguments would log "too many
  args" 14 times. This was not helpful.
- Change one `if (..) switch ...` to `if (..) { switch ...`. The added
  brace is consistent with a near-identical switch immediately above.
- Elide one `SDValue` copy by using a reference rather than value. This
  is consistent with a variable declared immediately before it.

Reviewed By: yonghong-song

Differential Revision: https://reviews.llvm.org/D156136
2023-08-01 00:31:12 +03:00
Yonghong Song
6c412b6c6f [BPF] Add a few new insns under cpu=v4
In [1], a few new insns are proposed to expand BPF ISA to
  . fixing the limitation of existing insn (e.g., 16bit jmp offset)
  . adding new insns which may improve code quality
    (sign_ext_ld, sign_ext_mov, st)
  . feature complete (sdiv, smod)
  . better user experience (bswap)

This patch implemented insn encoding for
  . sign-extended load
  . sign-extended mov
  . sdiv/smod
  . bswap insns
  . unconditional jump with 32bit offset

The new bswap insns are generated under cpu=v4 for __builtin_bswap.
For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated
which is not intuitive for the user.

To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented.
For conditional branch which is beyond 16-bit offset, llvm will do
some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit
conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is
hueristic based. I have tested bpf selftest pyperf600 with unroll account
600 which can indeed generate 32-bit jump insn, e.g.,
        13:       06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619>

Eduard is working on to add 'st' insn to cpu=v4.

A list of llc flags:
  disable-ldsx, disable-movsx, disable-bswap,
  disable-sdiv-smod, disable-gotol
can be used to disable a particular insn for cpu v4.
For example, user can do:
  llc -march=bpf -mcpu=v4 -disable-movsx t.ll
to enable cpu v4 without movsx insns.

References:
  [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/

Differential Revision: https://reviews.llvm.org/D144829
2023-07-26 08:37:30 -07:00
Eduard Zingerman
18e13739b8 [BPF] Undo transformation for LICM.cpp:hoistMinMax()
Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation
that undoes LICM hoistMinMax pass.

The undo transformation converts the following patterns:

    x < min(a, b) -> x < a && x < b
    x > min(a, b) -> x > a || x > b
    x < max(a, b) -> x < a || x < b
    x > max(a, b) -> x > a && x > b

Where 'a' or 'b' is a constant.
Also supports `sext min(...) ...` and `zext min(...) ...`.

~~~

This was previously commited as 09feee559a29 and reverted in
0bf9bfeacc8c because of the testbot memory leak report:
  https://lab.llvm.org/buildbot/#/builders/5/builds/34931

The memory leak issue was caused by incorrect instruction removal
sequence in skinMinMaxBB():

    I->dropAllReferences();  -------->  I->eraseFromParent();
    I->removeFromParent();   fixed to

Differential Revision: https://reviews.llvm.org/D147990
2023-07-11 22:30:34 +03:00
Eduard Zingerman
0e7ff05fb3 [BPF][DebugInfo][NFC] Move BTF.h definitions from BPF target to DebugInfo
There are plans to add some BTF processing to tools like objdump and
readelf. This commit moves BTF.{h,def} files from BPF target specific
location to include/llvm/DebugInfo/* to avoid tools including headers
from lib/Target/*.

Reviewed By: yonghong-song, MaskRay

Differential Revision: https://reviews.llvm.org/D149501
2023-07-10 14:50:21 -07:00