324 Commits

Author SHA1 Message Date
Yingchi Long
70deb7bfe9
[BPF] expand cttz, ctlz for i32, i64 (#73668)
Fixes: https://github.com/llvm/llvm-project/issues/62252

Depends on: #73667
2024-04-01 10:57:54 +08:00
eddyz87
65b123e287
[BPF] rename 'arena' to 'address_space' (#85161)
There are a few places where `arena` name is used for pointers in
non-zero address space in BPF backend, rename these to use a more
generic `address_space`:
- macro `__BPF_FEATURE_ARENA_CAST` -> `__BPF_FEATURE_ADDR_SPACE_CAST
- name for arena global variables section `.arena.N` ->
`.addr_space.N`
2024-03-14 19:20:06 -07:00
4ast
2aacb56e83
BPF address space insn (#84410)
This commit aims to support BPF arena kernel side
[feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/):
- arena is a memory region accessible from both BPF program and
userspace;
- base pointers for this memory region differ between kernel and user
spaces;
- `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)`
translates src_reg, a pointer in src_addr_space to dst_reg, equivalent
pointer in dst_addr_space, {src,dst}_addr_space are immediate constants;
- number 0 is assigned to kernel address space;
- number 1 is assigned to user address space.

On the LLVM side, the goal is to make load and store operations on arena
pointers "transparent" for BPF programs:
- assume that pointers with non-zero address space are pointers to
  arena memory;
- assume that arena is identified by address space number;
- assume that address space zero corresponds to kernel address space;
- assume that every BPF-side load or store from arena is done via
pointer in user address space, thus convert base pointers using
`addr_space_cast(src_reg, 0, 1)`;

Only load, store, cmpxchg and atomicrmw IR instructions are handled by
this transformation.

For example, the following C code:

```c
   #define __as __attribute__((address_space(1)))
   void copy(int __as *from, int __as *to) { *to = *from; }
```

Compiled to the following IR:

```llvm
    define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) {
    entry:
      %0 = load i32, ptr addrspace(1) %from, align 4
      store i32 %0, ptr addrspace(1) %to, align 4
      ret void
    }
```

Is transformed to:

```llvm
    %to2 = addrspacecast ptr addrspace(1) %to to ptr     ;; !
    %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; !
    %0 = load i32, ptr %from1, align 4, !tbaa !3
    store i32 %0, ptr %to2, align 4, !tbaa !3
    ret void
```

And compiled as:

```asm
    r2 = addr_space_cast(r2, 0, 1)
    r1 = addr_space_cast(r1, 0, 1)
    r1 = *(u32 *)(r1 + 0)
    *(u32 *)(r2 + 0) = r1
    exit
```

Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>
2024-03-13 02:27:25 +02:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
James Y Knight
b856e77b2d
Set MaxAtomicSizeInBitsSupported for remaining targets. (#75703)
Targets affected:

- NVPTX and BPF: set to 64 bits.
- ARC, Lanai, and MSP430: set to 0 (they don't implement atomics).

Those which didn't yet add AtomicExpandPass to their pass pipeline now
do so.

This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass. On all these targets, this
now matches what Clang already does in the frontend.

The only targets which do not configure AtomicExpandPass now are:
- DirectX and SPIRV: they aren't normal backends.
- AVR: a single-cpu architecture with no privileged/user divide, which
could implement all atomics by disabling/enabling interrupts, regardless
of size/alignment. Will be addressed by future work.
2024-01-08 22:34:28 -05:00
Yingwei Zheng
1228becf7d
[FuncAttrs] Deduce noundef attributes for return values (#76553)
This patch deduces `noundef` attributes for return values.
IIUC, a function returns `noundef` values iff all of its return values
are guaranteed not to be `undef` or `poison`.
Definition of `noundef` from LangRef:
```
noundef
This attribute applies to parameters and return values. If the value representation contains any 
undefined or poison bits, the behavior is undefined. Note that this does not refer to padding 
introduced by the type’s storage representation.
```
Alive2: https://alive2.llvm.org/ce/z/g8Eis6

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|+0.01%|-0.01%|+0.01%|+0.03%|-0.04%|+0.01%|

The motivation of this patch is to reduce the number of `freeze` insts
and enable more optimizations.
2023-12-31 20:44:48 +08:00
Yingchi Long
ddf85b92aa
[BPF] improve error handling by custom lowering & fail() (#75088)
Currently on mcpu=v3 we do not support sdiv, srem instructions. And the
backend crashes with stacktrace & coredump, which is misleading for end
users, as this is not a "bug"

Add llvm bug reporting for sdiv/srem on ISel legalize-op phase.

For clang frontend we can get detailed location & bug report.

    $ build/bin/clang -g -target bpf -c local/sdiv.c
local/sdiv.c:1:35: error: unsupported signed division, please convert to
unsigned div/mod.
        1 | int sdiv(int a, int b) { return a / b; }
          |                                   ^
    1 error generated.

Fixes: #70433
Fixes: #48647

This also improves error handling for dynamic stack allocation:

    local/vla.c:2:3: error: unsupported dynamic stack allocation
        2 |   int b[n];
          |   ^
    1 error generated.

Fixes: https://github.com/llvm/llvm-project/issues/57171
2023-12-13 13:41:52 +08:00
Yingchi Long
c4ac1d239f
[BPF][GlobalISel] select non-PreISelGenericOpcode (#75034)
This selects non-PreISelGenericOpcode as-is.

Depends on: #74999

Co-authored-by: Origami404 <Origami404@foxmail.com>
2023-12-12 16:19:34 +08:00
Yingchi Long
2460bf2fac
[BPF][GlobalISel] add initial gisel support for BPF (#74999)
This adds initial codegen support for BPF backend.

Only implemented ir-translator for "RET" (but not support isel).

Depends on: #74998
2023-12-11 19:58:34 +08:00
Eduard Zingerman
030b8cb156 [BPF] Attribute preserve_static_offset for structs
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
  updating the field offset;

  BPF verifier limits access patterns allowed for certain data
  types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
  types only `LD/ST <reg> <static-offset>` memory loads and stores are
  allowed.

  This is so because offsets of the fields of these structures do not
  match real offsets in the running kernel. During BPF program
  load/verification loads and stores to the fields of these types are
  rewritten so that offsets match real offsets. For this rewrite to
  happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
  kernel source tree for details.

- runtime environment might disallow access to the field of the type
  through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for
  register values. In case if register with such tag is modified BPF
  programs are not allowed to read or write memory using register. See
  kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
  source tree for details.

Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

This is a second attempt to commit this change, previous reverted
commit is: cb13e9286b6d4e384b5d4203e853d44e2eff0f0f.
The following items had been fixed:
- test case bpf-preserve-static-offset-bitfield.c now uses
  `-triple bpfel` to avoid different codegen for little/big endian
  targets.
- BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid
  use after free for `WorkList` elements `V`.

Differential Revision: https://reviews.llvm.org/D133361
2023-12-05 19:21:42 +02:00
Eduard Zingerman
2484469803 Revert "[BPF] Attribute preserve_static_offset for structs"
This reverts commit cb13e9286b6d4e384b5d4203e853d44e2eff0f0f.
Buildbot reports MSAN failures in tests added in this commit:
https://lab.llvm.org/buildbot/#/builders/5/builds/38806

Failing tests:
  LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll
2023-11-30 22:29:45 +02:00
Eduard Zingerman
cb13e9286b [BPF] Attribute preserve_static_offset for structs
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
  updating the field offset;

  BPF verifier limits access patterns allowed for certain data
  types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
  types only `LD/ST <reg> <static-offset>` memory loads and stores are
  allowed.

  This is so because offsets of the fields of these structures do not
  match real offsets in the running kernel. During BPF program
  load/verification loads and stores to the fields of these types are
  rewritten so that offsets match real offsets. For this rewrite to
  happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
  kernel source tree for details.

- runtime environment might disallow access to the field of the type
  through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for
  register values. In case if register with such tag is modified BPF
  programs are not allowed to read or write memory using register. See
  kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
  source tree for details.

Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

Differential Revision: https://reviews.llvm.org/D133361
2023-11-30 19:45:03 +02:00
Philip Reames
a7f35d54ee
[SCEV] Extend isImpliedCondOperandsViaRanges to independent predicates (#71110)
As far as I can tell, there's nothing in this code which actually
assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred
RHS) are the same.

Noticed while investigating something else, this is purely an
oppurtunistic optimization while I'm looking at the code. Unfortunately,
this doesn't solve my original problem. :)
2023-11-07 07:25:47 -08:00
yonghong-song
32e35b21b5
[BPF] Skip modifiers for __builtin_btf_type_id() local type (#71094)
BPF upstream reported an inconsistent behavior w.r.t. BPF_TYPE_ID_LOCAL
vs. BPF_TYPE_ID_TARGET (or BPF_TYPE_ID_REMOTE in LLVM terminology).

For BPF_TYPE_ID_TARGET, all modifiers (like 'const' and 'volatile') are
ignored in the final type encoding. For example, for type
 'const struct foo', the eventually encoding in BTF relocation
is 'struct foo'. This faciliates libbpf to match corresponding kernel
types with considering any modifiers.

Currently behavior for BPF_TYPE_ID_LOCAL is different. It will encode
'const struct foo' in BTF relocation and such discrepancy confused users
([1]).

This patch fixed this discrepancy by making BPF_TYPE_ID_LOCAL BTF type
representation the sams as BPF_TYPE_ID_TARGET. This should have minimum
user impact since ultimately user wants to get a real time not a 'const'
type modifier.

The selftest builtin-btf-type-id-2.ll is used to test BPF_TYPE_ID_TARGET
with 'const' modifier. Adapt the same test for BPF_TYPE_ID_LOCAL. And
the below diff shows now both BPF_TYPE_ID_LOCAL and BPF_TYPE_ID_TARGET
produces the same type:

$ diff test/CodeGen/BPF/BTF/builtin-btf-type-id-2.ll
test/CodeGen/BPF/BTF/builtin-btf-type-id-local.ll
--- test/CodeGen/BPF/BTF/builtin-btf-type-id-2.ll 2023-07-30
16:58:20.657528310 -0700
+++ test/CodeGen/BPF/BTF/builtin-btf-type-id-local.ll 2023-11-02
10:23:25.356959008 -0700
  @@ -6,7 +6,7 @@
   ;     int a;
   ;   };
   ;   int test(void) {
  -;     return __builtin_btf_type_id(*(const struct s *)0, 1);
  +;     return __builtin_btf_type_id(*(const struct s *)0, 0);
   ;   }
   ; Compilation flag:
; clang -target bpf -O2 -g -S -emit-llvm -Xclang -disable-llvm-passes
test.c
  $

[1]
https://lore.kernel.org/bpf/CAN+4W8h3yDjkOLJPiuKVKTpj_08pBz8ke6vN=Lf8gcA=iYBM-g@mail.gmail.com/

Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
2023-11-03 12:52:16 -07:00
Philip Reames
f6f769203d [tests] Autogenerate a couple of tests
As usual, making it easier for an upcoming test delta to be seen.

Note that several of these are examples of extremely bad testing practice.
Checking internal debug output (for no real purpose), and checking the
result of a fully O2 + llc run instead of reducing the specific problematic
pass.
2023-11-03 08:42:23 -07:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Nikita Popov
2ad9fde418
[MemDep] Use EarliestEscapeInfo (#69727)
Use BatchAA with EarliestEscapeInfo instead of callCapturesBefore() in
MemDepAnalysis. The advantage of this is that it will also take
not-captured-before information into account for non-calls (see
test_store_before_capture for a representative example), and that this
is a cached analysis. The disadvantage is that EII is slightly less
precise than full CapturedBefore analysis.

In practice the impact is positive, with gvn.NumGVNLoad going from 22022
to 22808 on test-suite. 

The impact to compile-time is also positive, mainly in the ThinLTO
configuration.
2023-10-23 09:57:26 +02:00
Nikita Popov
a72d88fb4f Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 8840da2db237cd714d975c199d5992945d2b71e9.

This results in verifier failures during LTO, see #68929.
2023-10-16 12:17:24 +02:00
Nikita Popov
8840da2db2 Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size
Reapply now that generation of incorrect debuginfo for FnDef
in rustc has been fixed.

-----

Add a check that the DILocalVariable fragment size in dbg.declare
does not exceed the size of the alloca.

This would have caught the invalid debuginfo regenerated by rustc
in https://github.com/llvm/llvm-project/issues/64149.

Differential Revision: https://reviews.llvm.org/D158743
2023-10-09 14:22:12 +02:00
Alex Richardson
83c4227ab7 Auto-generate test checks for tests affected by D141060
These files had manual CHECK lines which make the diff from D141060
very difficult to review.
2023-10-04 10:51:35 -07:00
Nikita Popov
38c59b9f53 Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 47324cfd7d8ca1a2a5cbb9f948ecff66a28ee6bc.

This exposed incorrect debuginfo in rustc. Revert the verification
until this has been fixed.
2023-09-18 17:24:53 +02:00
Nikita Popov
47324cfd7d Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size
Reapply after fixing a clang bug this exposed in D158972 and
adjusting a number of tests that failed for 32-bit targets.

-----

Add a check that the DILocalVariable fragment size in dbg.declare
does not exceed the size of the alloca.

This would have caught the invalid debuginfo regenerated by rustc
in https://github.com/llvm/llvm-project/issues/64149.

Differential Revision: https://reviews.llvm.org/D158743
2023-09-15 14:51:50 +02:00
Fangrui Song
806761a762 [test] Change llc -march= to -mtriple=
The issue is uncovered by #47698: for IR files without a target triple,
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense, e.g. riscv64-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
2023-09-11 14:42:37 -07:00
Nikita Popov
98cf20f890 Revert "[Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 183f49c3e0f4a7facf237581f83ae07e7f4544ab.

The lang/cpp/trivial_abi/TestTrivialABI.py lldb test fails on
buildbots.
2023-08-28 09:44:51 +02:00
Nikita Popov
183f49c3e0 [Verifier] Sanity check alloca size against DILocalVariable fragment size
Add a check that the DILocalVariable fragment size in dbg.declare
does not exceed the size of the alloca.

This would have caught the invalid debuginfo regenerated by rustc
in https://github.com/llvm/llvm-project/issues/64149.

Differential Revision: https://reviews.llvm.org/D158743
2023-08-28 09:16:33 +02:00
Eduard Zingerman
651e644595 [BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree()
Replace `BPFMIPeepholeTruncElim` by adding an overload for
`TargetLowering::isZExtFree()` aware that zero extension is
free for `ISD::LOAD`.

Short description
=================

The `BPFMIPeepholeTruncElim` handles two patterns:

Pattern #1:

    %1 = LDB %0, ...              %1 = LDB %0, ...
    %2 = AND_ri %1, 0xff      ->  %2 = MOV_ri %1    <-- (!)

Pattern #2:

    bb.1:                         bb.1:
      %a = LDB %0, ...              %a = LDB %0, ...
      br %bb3                       br %bb3
    bb.2:                         bb.2:
      %b = LDB %0, ...        ->    %b = LDB %0, ...
      br %bb3                       br %bb3
    bb.3:                         bb.3:
      %1 = PHI %a, %b               %1 = PHI %a, %b
      %2 = AND_ri %1, 0xff          %2 = MOV_ri %1  <-- (!)

Plus variations:
- AND_ri_32 instead of AND_ri
- SLL/SLR instead of AND_ri
- LDH, LDW, LDB32, LDH32, LDW32

Both patterns could be handled by built-in transformations at
instruction selection phase if suitable `isZExtFree()` implementation
is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`.

When evaluating on BPF kernel selftests and remove_truncate_*.ll LLVM
test cases this revisions performs slightly better than
BPFMIPeepholeTruncElim, see "Impact" section below for details.

Commit also adds a few test cases to make sure that patterns in
question are handled.

Long description
================

Why this works: Pattern #1
--------------------------

Consider the following example:

    define i1 @foo(ptr %p) {
    entry:
      %a = load i8, ptr %p, align 1
      %cond = icmp eq i8 %a, 0
      ret i1 %cond
    }

Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command:

    ...
    Type-legalized selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64
          t19: i64 = and t16, Constant:i64<255>
        t17: i64 = setcc t19, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Replacing.1 t19: i64 = and t16, Constant:i64<255>
    With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64
     and 0 other values
    ...
    Optimized type-legalized selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 11 nodes:
      t0: ch,glue = EntryToken
            t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
        t17: i64 = setcc t20, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...

Note:
- Optimized type-legalized selection DAG:
  - `t19 = and t16, 255` had been replaced by `t16` (load).
  - Patterns like `(and (load ... i8), 255)` are replaced by `load`
    in `DAGCombiner::BackwardsPropagateMask` called from
    `DAGCombiner::visitAND`.
  - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by
    `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge,
    look for `TLI.shouldFoldConstantShiftPairToMask()` call).

Why this works: Pattern #2
--------------------------

Consider the following example:

    define i1 @foo(ptr %p) {
    entry:
      %a = load i8, ptr %p, align 1
      br label %next

    next:
      %cond = icmp eq i8 %a, 0
      ret i1 %cond
    }

Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command.
Log for first basic block:

    Initial selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 9 nodes:
      t0: ch,glue = EntryToken
      t3: i64 = Constant<0>
            t2: i64,ch = CopyFromReg t0, Register:i64 %1
          t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64
        t6: i64 = zero_extend t5
      t8: ch = CopyToReg t0, Register:i64 %0, t6
    ...
    Replacing.1 t6: i64 = zero_extend t5
    With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
     and 0 other values
    ...
    Optimized lowered selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 7 nodes:
      t0: ch,glue = EntryToken
          t2: i64,ch = CopyFromReg t0, Register:i64 %1
        t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
      t8: ch = CopyToReg t0, Register:i64 %0, t9

Note:
- Initial selection DAG:
  - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))`
    w/o special `isZExtFree()` overload added by this commit
    it is instead lowered as `t6 = (any_extend (load ...))`.
  - The decision to generate `zero_extend` or `any_extend` is
    done in `RegsForValue::getCopyToRegs` called from
    `SelectionDAGBuilder::CopyValueToVirtualRegister`:
    - if `isZExtFree()` for load returns true `zero_extend` is used;
    - `any_extend` is used otherwise.
- Optimized lowered selection DAG:
  - `t6 = (any_extend (load ...))` is replaced by
    `t9 = load ..., zext from i8`
    This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from
    `DAGCombiner::visitZERO_EXTEND`.

Log for second basic block:

    Initial selection DAG: %bb.1 'foo:next'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
                t2: i64,ch = CopyFromReg t0, Register:i64 %0
              t4: i64 = AssertZext t2, ValueType:ch:i8
            t5: i8 = truncate t4
          t8: i1 = setcc t5, Constant:i8<0>, seteq:ch
        t9: i64 = any_extend t8
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Replacing.2 t18: i64 = and t4, Constant:i64<255>
    With: t4: i64 = AssertZext t2, ValueType:ch:i8
    ...
    Type-legalized selection DAG: %bb.1 'foo:next'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t4: i64 = AssertZext t2, ValueType:ch:i8
          t18: i64 = and t4, Constant:i64<255>
        t16: i64 = setcc t18, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Optimized type-legalized selection DAG: %bb.1 'foo:next'
    SelectionDAG has 11 nodes:
      t0: ch,glue = EntryToken
            t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t4: i64 = AssertZext t2, ValueType:ch:i8
        t16: i64 = setcc t4, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...

Note:
- Initial selection DAG:
  - `t0` is an input value for this basic block, it corresponds load
    instruction (`t9`) from the first basic block.
  - It is accessed within basic block via
    `t4` (AssertZext (CopyFromReg t0, ...)).
  - The `AssertZext` is generated by RegsForValue::getCopyFromRegs
    called from SelectionDAGBuilder::getCopyFromRegs, it is generated
    only when `LiveOutInfo` with known number of leading zeros is
    present for `t0`.
  - Known register bits in `LiveOutInfo` are computed by
    `SelectionDAG::computeKnownBits` called from
    `SelectionDAGISel::ComputeLiveOutVRegInfo`.
  - `computeKnownBits()` generates leading zeros information for
    `(load ..., zext from ...)` but *does not* generate leading zeros
    information for `(load ..., anyext from ...)`.
    This is why `isZExtFree()` added in this commit is important.
- Type-legalized selection DAG:
  - `t5 = truncate t4` is replaced by `t18 = and t4, 255`
- Optimized type-legalized selection DAG:
  - `t18 = and t4, 255` is replaced by `t4`, this is done by
    `DAGCombiner::SimplifyDemandedBits` called from
    `DAGCombiner::visitAND`, which simplifies patterns like
    `(and (assertzext ...))`

Impact
------

This change covers all remove_truncate_*.ll test cases:
- for -mcpu=v4 there are no changes in the generated code;
- for -mcpu=v2 code generated for remove_truncate_7 and
  remove_truncate_8 improved slightly, for other tests it is
  unchanged.

For remove_truncate_7:

    Before this revision                 After this revision
    --------------------                 -------------------
        r1 <<= 0x20                          r1 <<= 0x20
        r1 >>= 0x20                          r1 >>= 0x20
        if r1 == 0x0 goto +0x2 <LBB0_2>      if r1 == 0x0 goto +0x2 <LBB0_2>
        r1 = *(u32 *)(r2 + 0x0)              r0 = *(u32 *)(r2 + 0x0)
        goto +0x1 <LBB0_3>                   goto +0x1 <LBB0_3>
    <LBB0_2>:                            <LBB0_2>:
        r1 = *(u32 *)(r2 + 0x4)              r0 = *(u32 *)(r2 + 0x4)
    <LBB0_3>:                            <LBB0_3>:
        r0 = r1                              exit
        exit

For remove_truncate_8:

    Before this revision                 After this revision
    --------------------                 -------------------
        r2 = *(u32 *)(r1 + 0x0)              r2 = *(u32 *)(r1 + 0x0)
        r3 = r2                              r3 = r2
        r3 <<= 0x20                          r3 <<= 0x20
        r4 = r3                              r3 s>>= 0x20
        r4 s>>= 0x20
        if r4 s> 0x2 goto +0x5 <LBB0_3>      if r3 s> 0x2 goto +0x4 <LBB0_3>
        r4 = *(u32 *)(r1 + 0x4)              r3 = *(u32 *)(r1 + 0x4)
        r3 >>= 0x20
        if r3 >= r4 goto +0x2 <LBB0_3>       if r2 >= r3 goto +0x2 <LBB0_3>
        r2 += 0x2                            r2 += 0x2
        *(u32 *)(r1 + 0x0) = r2              *(u32 *)(r1 + 0x0) = r2
    <LBB0_3>:                            <LBB0_3>:
        r0 = 0x3                             r0 = 0x3
        exit                                 exit

For kernel BPF selftests statistics is as follows: (-mcpu=v4):
- For -mcpu=v4: 9 out of 655 object files have differences,
  in all cases total number of instructions marginally decreased
  (-27 instructions).
- For -mcpu=v2: 9 out of 655 object files have differences:
  - For 19 object files number of instruction decreased
    (-129 instruction in total): some redundant `rX &= 0xffff`
    and register to register assignments removed;
  - For 2 object files number of instructions increased +2
    instructions in each file.

Both -mcpu=v2 instruction increases could be reduced to the same
example:

    define void @foo(ptr %p) {
    entry:
      %a = load i32, ptr %p, align 4
      %b = sext i32 %a to i64
      %c = icmp ult i64 1, %b
      br i1 %c, label %next, label %end

    next:
      call void inttoptr (i64 62 to ptr)(i32 %a)
      br label %end

    end:
      ret void
    }

Note that this example uses value loaded to `%a` both as a sign
extended (`%b`) and as zero extended (`%a` passed as parameter).
Here is the difference in final assembly code:

    Before this revision          After this revision
    --------------------          -------------------
        r1 = *(u32 *)(r1 + 0)         r1 = *(u32 *)(r1 + 0)
        r1 <<= 32                     r1 <<= 32
        r1 s>>= 32                    r1 s>>= 32
        if r1 < 2 goto <LBB0_2>       if r1 < 2 goto <LBB0_2>
                                      r1 <<= 32
                                      r1 >>= 32
        call 62                       call 62
    <LBB0_2>:                     <LBB0_2>:
        exit                          exit

Before this commit `%a` is passed to call as a sign extended value,
after this commit `%a` is passed to call as a zero extended value,
both are correct as 32-bit sub-register is the same.

The difference comes from `DAGCombiner` operation on the initial DAG:

Initial selection DAG before this commit:

    t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
          t6: i64 = any_extend t5         <--------------------- (1)
        t8: ch = CopyToReg t0, Register:i64 %0, t6
            t9: i64 = sign_extend t5
          t12: i1 = setcc Constant:i64<1>, t9, setult:ch

Initial selection DAG after this commit:

    t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
          t6: i64 = zero_extend t5        <--------------------- (2)
        t8: ch = CopyToReg t0, Register:i64 %0, t6
            t9: i64 = sign_extend t5
          t12: i1 = setcc Constant:i64<1>, t9, setult:ch

The node `t9` is processed before node `t6` and `load` instruction is
combined to load with sign extension:

    Replacing.1 t9: i64 = sign_extend t5
    With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64
     and 0 other values
    Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
    With: t31: i32 = truncate t30
     and 1 other values

This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from
`DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which
is `any_extend` in (1) and `zero_extend` in (2).
`tryToFoldExtOfLoad()` rewrites such uses of `t5` differently:
- `any_extend` is simply removed
- `zero_extend` is replaced by `and t30, 0xffffffff`, which is later
  converted to a pair of shifts. This pair of shifts survives till the
  end of translation.

Differential Revision: https://reviews.llvm.org/D157870
2023-08-22 00:04:51 +03:00
Eduard Zingerman
8f28e8069c [BPF] support for BPF_ST instruction in codegen
Generate store immediate instruction when CPUv4 is enabled.
For example:

    $ cat test.c
    struct foo {
      unsigned char  b;
      unsigned short h;
      unsigned int   w;
      unsigned long  d;
    };
    void bar(volatile struct foo *p) {
      p->b = 1;
      p->h = 2;
      p->w = 3;
      p->d = 4;
    }

    $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
    ...
    0000000000000000 <bar>:
           0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
           1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
           2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
           3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
           4:	95 00 00 00 00 00 00 00	exit

Take special care to:
- apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST
- validate immediate value when BPF_ST write is 64-bit:
  BPF interprets `(BPF_ST | BPF_MEM | BPF_DW)` writes as writes with
  sign extension. Thus it is fine to generate such write when
  immediate is -1, but it is incorrect to generate such write when
  immediate is +0xffff_ffff.

This commit was previously reverted in e66affa17e32.
The reason for revert was an unrelated bug in BPF backend,
triggered by test case added in this commit if LLVM is built
with LLVM_ENABLE_EXPENSIVE_CHECKS.
The bug was fixed in D157806.

Differential Revision: https://reviews.llvm.org/D140804
2023-08-16 17:51:28 +03:00
Eduard Zingerman
08d92dedd2 [BPF] Fix in/out argument constraints for CORE_MEM instructions
When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the
following C code snippet:

    struct t {
      int a;
    } __attribute__((preserve_access_index));

    void test(struct t *t) {
      t->a = 42;
    }

Causes an assertion:

$ clang -g -O2 -c --target=bpf -mcpu=v2 t.c -o /dev/null

Function Live Ins: $r1 in %0

bb.0.entry:
  liveins: $r1
  DBG_VALUE $r1, $noreg, !"t", ...
  %0:gpr = COPY $r1
  DBG_VALUE %0:gpr, $noreg, !"t", ...
  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
  %4:gpr = MOV_ri 42
  CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...
  RET debug-location !25; t.c:7:1

*** Bad machine code: Explicit definition marked as use ***
- function:    test
- basic block: %bb.0 entry (0x6210000d8a90)
- instruction: CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...
- operand 0:   killed %4:gpr

This happens because `CORE_MEM` instruction is defined to have output
operands:

  def CORE_MEM : TYPE_LD_ST<BPF_MEM.Value, BPF_W.Value,
                            (outs GPR:$dst),
                            (ins u64imm:$opcode, GPR:$src, u64imm:$offset),
                            "$dst = core_mem($opcode, $src, $offset)",
                            []>;

As documented in [1]:

> By convention, the LLVM code generator orders instruction operands
> so that all register definitions come before the register uses, even
> on architectures that are normally printed in other orders.

In other words, the first argument for `CORE_MEM` is considered to be
a "def", while in reality it is "use":

  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
  %4:gpr = MOV_ri 42
   '---------------.
                   v
  CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...

Here is how `CORE_MEM` is constructed in
`BPFMISimplifyPatchable::checkADDrr()`:

    BuildMI(*DefInst->getParent(), *DefInst, DefInst->getDebugLoc(), TII->get(COREOp))
        .add(DefInst->getOperand(0)).addImm(Opcode).add(*BaseOp)
        .addGlobalAddress(GVal);

Note that first operand is constructed as `.add(DefInst->getOperand(0))`.

For `LD{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a
destination register of a load, so instruction is constructed in
accordance with `outs` declaration.

For `ST{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a
source register of a store (value to be stored), so instruction
violates the `outs` declaration.

This commit fixes the issue by splitting `CORE_MEM` in three
instructions: `CORE_ST`, `CORE_LD64`, `CORE_LD32` with correct `outs`
specifications.

[1] https://llvm.org/docs/CodeGenerator.html#the-machineinstr-class

Differential Revision: https://reviews.llvm.org/D157806
2023-08-15 02:34:21 +03:00
Eduard Zingerman
27026fe563 [BPF] Reset machine register kill mark in BPFMISimplifyPatchable
When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option
the following C code snippet:

    struct t {
      unsigned long a;
    } __attribute__((preserve_access_index));

    void foo(volatile struct t *t, volatile unsigned long *p) {
      *p = t->a;
      *p = t->a;
    }

Causes an assertion:

    $ clang -g -O2 -c --target=bpf -mcpu=v2 t2.c -o /dev/null

    # After BPF PreEmit SimplifyPatchable
    # Machine code for function foo: IsSSA, TracksLiveness
    Function Live Ins: $r1 in %0, $r2 in %1

    bb.0.entry:
      liveins: $r1, $r2
      DBG_VALUE $r1, $noreg, !"t", !DIExpression()
      DBG_VALUE $r2, $noreg, !"p", !DIExpression()
      %1:gpr = COPY $r2
      DBG_VALUE %1:gpr, $noreg, !"p", !DIExpression()
      %0:gpr = COPY $r1
      DBG_VALUE %0:gpr, $noreg, !"t", !DIExpression()
      %2:gpr = LD_imm64 @"llvm.t:0:0$0:0"
      %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
      %5:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0"
      STD killed %5:gpr, %1:gpr, 0
      %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
      %8:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0"
      STD killed %8:gpr, %1:gpr, 0
      RET

    # End machine code for function foo.

    *** Bad machine code: Using a killed virtual register ***
    - function:    foo
    - basic block: %bb.0 entry (0x6210000e6690)
    - instruction: %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
    - operand 2:   killed %2:gpr

This happens because of the way
BPFMISimplifyPatchable::processDstReg() updates second operand of the
`ADD_rr` instruction. Code before `BPFMISimplifyPatchable`:

    .-> %2:gpr = LD_imm64 @"llvm.t:0:0$0:0"
    |
    |`----------------.
    |   %3:gpr = LDD %2:gpr, 0
    |   %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %3:gpr <--- (1)
    |   %5:gpr = LDD killed %4:gpr, 0       ^^^^^^^^^^^^^
    |   STD killed %5:gpr, %1:gpr, 0        this is updated
     `----------------.
        %6:gpr = LDD %2:gpr, 0
        %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %6:gpr <--- (2)
        %8:gpr = LDD killed %7:gpr, 0       ^^^^^^^^^^^^^
        STD killed %8:gpr, %1:gpr, 0        this is updated

Instructions (1) and (2) would be updated to:

    ADD_rr %0:gpr(tied-def 0), killed %2:gpr

The `killed` mark is inherited from machine operands `killed %3:gpr`
and `killed %6:gpr` which are updated inplace by `processDstReg()`.

This commit updates `processDstReg()` reset kill marks for updated
machine operands to keep liveness information conservatively correct.

Differential Revision: https://reviews.llvm.org/D157805
2023-08-15 02:23:38 +03:00
Eduard Zingerman
e66affa17e Revert "[BPF] support for BPF_ST instruction in codegen"
This reverts commit 92e28e397d4ccf1bff075f48e22cf1e23a7d02bf.

Reverting to investigate buildbot failure reported in [1].

    field-reloc-st-imm.ll:
    *** Bad machine code: Explicit definition must be a register ***
    - function:    bar
    - basic block: %bb.0 entry (0x742f318)
    - instruction: CORE_MEM 3, 416, %0:gpr, @"llvm.foo:0:4$0:2", ...
    - operand 0:   3
    *** Bad machine code: Explicit definition must be a register ***
    - function:    bar
    - basic block: %bb.0 entry (0x742f318)
    - instruction: CORE_MEM 4, 410, %0:gpr, @"llvm.foo:0:8$0:3", ...
    - operand 0:   4
    LLVM ERROR: Found 4 machine code errors.

[1] https://lab.llvm.org/buildbot/#/builders/16/builds/52877
2023-08-11 02:23:40 +03:00
Eduard Zingerman
92e28e397d [BPF] support for BPF_ST instruction in codegen
Generate store immediate instruction when CPUv4 is enabled.
For example:

    $ cat test.c
    struct foo {
      unsigned char  b;
      unsigned short h;
      unsigned int   w;
      unsigned long  d;
    };
    void bar(volatile struct foo *p) {
      p->b = 1;
      p->h = 2;
      p->w = 3;
      p->d = 4;
    }

    $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
    ...
    0000000000000000 <bar>:
           0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
           1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
           2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
           3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
           4:	95 00 00 00 00 00 00 00	exit

Take special care to:
- apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST
- validate immediate value when BPF_ST write is 64-bit:
  BPF interprets `(BPF_ST | BPF_MEM | BPF_DW)` writes as writes with
  sign extension. Thus it is fine to generate such write when
  immediate is -1, but it is incorrect to generate such write when
  immediate is +0xffff_ffff.

Differential Revision: https://reviews.llvm.org/D140804
2023-08-11 02:07:29 +03:00
Tamir Duberstein
055893beac
[BPF] Don't crash on missing line info
When compiling Rust code we may end up with calls to functions provided
by other code units. Presently this code crashes on a null pointer
dereference - this patch avoids that crash and adds a test.

Reviewed By: ast

Differential Revision: https://reviews.llvm.org/D156446
2023-08-03 09:18:12 -04:00
Tamir Duberstein
59afd29899 [BPF] Match CHECK w/ LLVM_ENABLE_ASSERTIONS=OFF (D156136) 2023-08-01 11:12:43 +09:00
Tamir Duberstein
d542a56c1c [BPF] Clean up SelLowering
This patch contains a number of uncontroversial changes:
- Replace all uses of
  `errs`, `assert`, `llvm_unreachable` with `report_fatal_error` with
  informative error strings.
- Replace calls to `fail` in loops with at most one call per error
  instance. Previously a function with 19 arguments would log "too many
  args" 14 times. This was not helpful.
- Change one `if (..) switch ...` to `if (..) { switch ...`. The added
  brace is consistent with a near-identical switch immediately above.
- Elide one `SDValue` copy by using a reference rather than value. This
  is consistent with a variable declared immediately before it.

Reviewed By: yonghong-song

Differential Revision: https://reviews.llvm.org/D156136
2023-08-01 00:31:12 +03:00
Yonghong Song
6c412b6c6f [BPF] Add a few new insns under cpu=v4
In [1], a few new insns are proposed to expand BPF ISA to
  . fixing the limitation of existing insn (e.g., 16bit jmp offset)
  . adding new insns which may improve code quality
    (sign_ext_ld, sign_ext_mov, st)
  . feature complete (sdiv, smod)
  . better user experience (bswap)

This patch implemented insn encoding for
  . sign-extended load
  . sign-extended mov
  . sdiv/smod
  . bswap insns
  . unconditional jump with 32bit offset

The new bswap insns are generated under cpu=v4 for __builtin_bswap.
For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated
which is not intuitive for the user.

To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented.
For conditional branch which is beyond 16-bit offset, llvm will do
some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit
conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is
hueristic based. I have tested bpf selftest pyperf600 with unroll account
600 which can indeed generate 32-bit jump insn, e.g.,
        13:       06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619>

Eduard is working on to add 'st' insn to cpu=v4.

A list of llc flags:
  disable-ldsx, disable-movsx, disable-bswap,
  disable-sdiv-smod, disable-gotol
can be used to disable a particular insn for cpu v4.
For example, user can do:
  llc -march=bpf -mcpu=v4 -disable-movsx t.ll
to enable cpu v4 without movsx insns.

References:
  [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/

Differential Revision: https://reviews.llvm.org/D144829
2023-07-26 08:37:30 -07:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
Eduard Zingerman
18e13739b8 [BPF] Undo transformation for LICM.cpp:hoistMinMax()
Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation
that undoes LICM hoistMinMax pass.

The undo transformation converts the following patterns:

    x < min(a, b) -> x < a && x < b
    x > min(a, b) -> x > a || x > b
    x < max(a, b) -> x < a || x < b
    x > max(a, b) -> x > a && x > b

Where 'a' or 'b' is a constant.
Also supports `sext min(...) ...` and `zext min(...) ...`.

~~~

This was previously commited as 09feee559a29 and reverted in
0bf9bfeacc8c because of the testbot memory leak report:
  https://lab.llvm.org/buildbot/#/builders/5/builds/34931

The memory leak issue was caused by incorrect instruction removal
sequence in skinMinMaxBB():

    I->dropAllReferences();  -------->  I->eraseFromParent();
    I->removeFromParent();   fixed to

Differential Revision: https://reviews.llvm.org/D147990
2023-07-11 22:30:34 +03:00
Eduard Zingerman
0bf9bfeacc Revert "[BPF] Undo transformation for LICM.cpp:hoistMinMax()"
This reverts commit 09feee559a294611257ee157dba039fb05fe4f68.

Revert because of a testbot failure:
  https://lab.llvm.org/buildbot/#/builders/5/builds/34931
2023-07-07 04:01:31 +03:00
Eduard Zingerman
09feee559a [BPF] Undo transformation for LICM.cpp:hoistMinMax()
Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation
that undoes LICM hoistMinMax pass.

The undo transformation converts the following patterns:

    x < min(a, b) -> x < a && x < b
    x > min(a, b) -> x > a || x > b
    x < max(a, b) -> x < a || x < b
    x > max(a, b) -> x > a && x > b

Where 'a' or 'b' is a constant.
Also supports `sext min(...) ...` and `zext min(...) ...`.

Differential Revision: https://reviews.llvm.org/D147990
2023-07-06 16:19:59 +03:00
Eduard Zingerman
6a6db74b77 [BPF] Propagate NoMerge attribute when lowering function calls
`NoMerge` attribute on machine instructions prevents certain
transformations from merging these instructions.
One of such transformations is 'llvm/lib/CodeGen/BranchFolding.cpp'.

This attribute should be copied from IR `call` instructions to machine
level instructions. See `X86TargetLowering::LowerCall` as another
example.

Differential Revision: https://reviews.llvm.org/D152987
2023-06-27 01:15:45 +03:00
Fangrui Song
2a61ceddb3 [BPF] Remove unused legacy passes after TargetMachine::adjustPassManager removal
D137796 made these passes unused.

`opt --bpf-ir-peephole` is specified in one test. Add a `registerPipelineParsingCallback`
so that we can use change the test to use `opt --passes=bpf-ir-peephole` instead.
2023-06-24 22:44:06 -07:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Eduard Zingerman
8f906bec79 [BPF] Make sure ALU32 feature is set in MCSubtargetInfo for mcpu=v3
`BPF.td` is used to generate (among other things) `MCSubtargetInfo`
setup function for BPF target.
Specifically, the `BPFGenSubtargetInfo.inc` file:

    enum {
      ALU32 = 0,
      ...
    };
    ...
    extern const llvm::SubtargetSubTypeKV BPFSubTypeKV[] = {
      { "generic", { { { 0x0ULL, ... } } }, ... },
      { "probe",   { { { 0x0ULL, ... } } }, ... },
      { "v1",      { { { 0x0ULL, ... } } }, ... },
      { "v2",      { { { 0x0ULL, ... } } }, ... },
      { "v3",      { { { 0x1ULL, ... } } }, ... },
    };
    ...
    static inline MCSubtargetInfo *createBPFMCSubtargetInfoImpl(...) {
      return new BPFGenMCSubtargetInfo(..., BPFSubTypeKV, ...);
    }

The `SubtargetSubTypeKV` is defined in `MCSubtargetInfo.h` as:

    /// Used to provide key value pairs for feature and CPU bit flags.
    struct SubtargetSubTypeKV {
      const char *Key;                      ///< K-V key string
      FeatureBitArray Implies;              ///< K-V bit mask
      FeatureBitArray TuneImplies;          ///< K-V bit mask
      const MCSchedModel *SchedModel;
      ...
    }

The first bit array specifies features enabled by default for a
specific CPU. This commit makes sure that this information is
communicated to `tablegen` and correct `BPFSubTypeKV` table is
generated. This allows tools like `objdump` to detect available
features when `--mcpu` flag is specified.

Differential Revision: https://reviews.llvm.org/D148037
2023-04-17 20:08:45 +03:00
Momchil Velikov
4ac6f99ae0 [LiveInterval] Fix live range overlap check
Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D145707
2023-04-11 11:11:30 +01:00
Eduard Zingerman
d0d1431ab1 [BPF] Fix assembly parsing errors for atomic_fetch_* instructions
Fixes BPF assembler parsing errors for the following instructions:
- atomic_fetch_add
- atomic_fetch_and
- atomic_fetch_xor
- atomic_fetch_or
- cmpxchg32_32
- cmpxchg_64
- xchg32_32
- xchg_64

Also add a test to verify that all instructions could be assembled and disassembled.

Differential Revision: https://reviews.llvm.org/D147421
2023-04-05 00:55:32 +03:00
Yonghong Song
db3d2adecb [BPF] Improve pruning to avoid generate more types in BTF
Commit 3671bdbcd214("[BPF] Fix a BTF type pruning bug") fixed a
pruning bug to allow generate more types. But the commit has a bug
which permits to generate more types than necessary. The following
is an example to illustrate the problem.

   struct t1 {
     int a;
   };
   struct t2 {
     struct t1 *p1;
     struct t1 *p2;
     int b;
   };
   int foo(struct t2 *arg) {
     return arg->b;
   }

The following is the part of BTF generation sequence:
  (1). 'struct t2 *arg' -> 'struct t1 *p1'
       In this step, the type 'struct t1' will be generated as
       a forward decl and the ptr type (to 'struct t1') will
       be stored in the internal type table.
  (2). now the second field 'struct t1 *p2' will be processed.
       Since the ptr type (to 'struct t1') already in the type
       table, the existing logic strips out ptr modifier and
       is able to generate BTF type for 'struct t1'.

In the above step (2), if CheckPointer is true (the type traversal
chain including a struct member), 'ptr' modifier should be checked
and the subsequent type generation should be skipped since
the same case has been processed in visitDerivedType().

The issue is exposed when I am trying to use llvm15 to compile
some internal bpf programs. The bpf skeleton put the whole
ELF section (after striping some sections like dwarf) as a string.
The large BTF section triggered the following error:

  bpf_object_with_struct_ops_test_prog_bpf/BpfObjectWithStructOpsTestProg.skel.h:222:23:
  error: string literal of length 140144 exceeds maximum length 65536 that C++ compilers
  are required to support [-Werror,-Woverlength-strings]
        return (const void *)"\
                             ^~
  1 error generated.

Although adding -Wno-overlength-strings could workaround the issue,
improving llvm BTF generation sounds better esp. for users using vmlinux.h.

Differential Revision: https://reviews.llvm.org/D145816
2023-03-13 09:34:37 -07:00
Andrew Savonichev
c65b4d64d4 [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2023-02-09 18:45:20 +03:00
Eduard Zingerman
f60aefdc7f [BPF] generate btf_decl_tag records for params of extern functions
After frontend changes in the following commit:
"BPF: preserve btf_decl_tag for parameters of extern functions"
same mechanics could be used to get the list of function parameters
and associated btf_decl_tag entries for both extern and non-extern
functions.

This commit extracts this mechanics as a separate auxiliary function
BTFDebug::processDISubprogram(). The function is called for both
extern and non-extern functions in order to generated corresponding
BTF_DECL_TAG records.

Differential Revision: https://reviews.llvm.org/D140971
2023-01-07 09:32:18 -08:00
Eduard Zingerman
ed068386b4 [BPF] Use SectionForGlobal() for section names computation in BTF
Use function TargetLoweringObjectFile::SectionForGlobal() to compute
section names for globals described in BTF_KIND_DATASEC records.

This fixes a discrepancy in section name computation between
BTFDebug::processGlobals and the rest of the LLVM pipeline.

Specifically, the following example illustrates the discrepancy
before this commit:

  struct Foo {
    int i;
  } __attribute__((aligned(16)));
  struct Foo foo = { 0 };

The initializer for 'foo' looks as follows:

  %struct.Foo { i32 0, [12 x i8] undef }

TargetLoweringObjectFile::SectionForGlobal() classifies 'foo' as
a part of '.bss' section, while BTFDebug::processGlobals
classified it as a part of '.data' section because of the
following expression:

  SecName = Global.getInitializer()->isZeroValue() ? ".bss" : ".data"

The isZeroValue() returns false because of the undef tail of the
initializer, while SectionForGlobal() allows such patterns in '.bss'.

Differential Revision: https://reviews.llvm.org/D140505
2022-12-29 11:27:19 -08:00