86 Commits

Author SHA1 Message Date
Andrew Rogers
19658d1474
[llvm] annotate interfaces in llvm/Target for DLL export (#143615)
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.

In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-17 13:28:45 -07:00
yonghong-song
ab391beb11
[BPF] Handle traps with kfunc call __bpf_trap (#131731)
Currently, middle-end generates 'unreachable' insn if the compiler
feels the code is indeed unreachable or the code becomes invalid
due to some optimizaiton (e.g. code optimization with uninitialized
variables).

Right now BPF backend ignores 'unreachable' insn during selectiondag
lowering. For cases where 'unreachable' is due to invalid code
transformation, such a signal will be missed. Later on, users needs
some effort to debug it which impacts developer productivity.

This patch enabled selectiondag lowering for 'unreachable' insn.

Previous attempt ([1]) tries to have a backend IR pass to filter
out 'unreachable' insns in a number of cases. But such pattern
matching may misalign with future middle-end optimization with
'unreachable' insns.

This patch takes a different approach. The 'unreachable' insn is
lowered with special encoding in bpf object file and verifier
will do proper verification for the bpf prog. More specifically,
the 'unreachable' insn is replaced by a __bpf_trap() function.
This function will be a kfunc (in ".ksyms" section) with a weak
attribute, but does not have definition. The actual kfunc definition
is expected to be in kernel. The __bpf_trap() extern function
is also encoded in BTF. The name __bpf_trap() is chosen to satisfy
reserved identifier requirement.

Besides the uninitialized variable case, the builtin function
'__builtin_trap' can also generate kfunc __bpf_trap(). For example
in [3], we have
```
  # define __bpf_unreachable()  __builtin_trap()
```
If the compiler didn't remove __builtin_trap() during middle-end
optimization, compilation will fail.

With this patch, compilation will not fail and __builtin_trap()
is converted to __bpf_trap() kfunc. The eventual failure will be
in verifier instead of llvm compilation. To keep compilation
time failure, user can add an option like `-ftrap-function=<something>`.

I tested this patch on bpf selftests and all tests are passed.
I also tried original example in [2] and the code looks like below:
```
          ; {
               0:       bf 16 00 00 00 00 00 00 r6 = r1
          ;       bpf_printk("Start");
               1:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
                        0000000000000008:  R_BPF_64_64  .rodata
               3:       b4 02 00 00 06 00 00 00 w2 = 0x6
               4:       85 00 00 00 06 00 00 00 call 0x6
          ; DEFINE_FUNC_CTX_POINTER(data)
               5:       61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c)
          ;       bpf_printk("pre ipv6_hdrlen_offset");
               6:       18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll
                        0000000000000030:  R_BPF_64_64  .rodata
               8:       b4 02 00 00 17 00 00 00 w2 = 0x17
               9:       85 00 00 00 06 00 00 00 call 0x6
              10:       85 10 00 00 ff ff ff ff call -0x1
                        0000000000000050:  R_BPF_64_32  __bpf_trap
              11:       95 00 00 00 00 00 00 00 exit
          <END>
```
Eventually kernel verifier will emit the following logs:
```
      10: (85) call __bpf_trap#74479
      unexpected __bpf_trap() due to uninitialized variable?
```
In another internal sched-ext bpf prog, with the patch we have bpf code:
```
  Disassembly of section .text:
  0000000000000000 <scx_storage_init_single>:
  ; {
       0:       bc 13 00 00 00 00 00 00 w3 = w1
       1:       b4 01 00 00 00 00 00 00 w1 = 0x0
  ;       const u32 zero = 0;
  ...
  0000000000003a80 <create_dom>:
  ; {
    1872:       bc 16 00 00 00 00 00 00 w6 = w1
  ;       bpf_printk("dom_id %d", dom_id);
    1873:       18 01 00 00 3f 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x3f ll
                0000000000003a88:  R_BPF_64_64  .rodata
    1875:       b4 02 00 00 0a 00 00 00 w2 = 0xa
    1876:       bc 63 00 00 00 00 00 00 w3 = w6
    1877:       85 00 00 00 06 00 00 00 call 0x6
  ;       ret = scx_bpf_create_dsq(dom_id, 0);
    1878:       bc 61 00 00 00 00 00 00 w1 = w6
    1879:       b4 02 00 00 00 00 00 00 w2 = 0x0
    1880:       85 10 00 00 ff ff ff ff call -0x1
                0000000000003ac0:  R_BPF_64_32  scx_bpf_create_dsq
  ;       domc->node_cpumask = node_data[node_id];
    1881:       85 10 00 00 ff ff ff ff call -0x1
                0000000000003ac8:  R_BPF_64_32  __bpf_trap
    1882:       95 00 00 00 00 00 00 00 exit
    <END>
```
The verifier can easily report the error too.

A bpf flag `-bpf-disable-trap-unreachable` is introduced to disable
trapping for 'unreachable' or __builtin_trap.

  [1] https://github.com/llvm/llvm-project/pull/126858
  [2] https://github.com/msune/clang_bpf/blob/main/Makefile#L3
  [3] https://github.com/libbpf/libbpf/blob/master/src/bpf_helpers.h
2025-05-27 13:34:15 -07:00
Matthias Braun
675cb70641
Register assembly printer passes (#138348)
Register assembly printer passes in the pass registry.

This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests.

Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow
targets to use the `INITIALIZE_PASS` macros and register the pass in the
pass registry. This currently has a default parameter so it won't break
any targets that have not been updated.
2025-05-06 18:01:17 -07:00
Sergei Barannikov
bb1765179e
[TTI] Simplify implementation (NFCI) (#136674)
Replace "concept based polymorphism" with simpler PImpl idiom.

This pursues two goals:
* Enforce static type checking. Previously, target implementations hid
base class methods and type checking was impossible. Now that they
override the methods, the compiler will complain on mismatched
signatures.
* Make the code easier to navigate. Previously, if you asked your
favorite LSP server to show a method (e.g. `getInstructionCost()`), it
would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`,
`TTIImplBase`, and target overrides. Now it is two less :)

There are three commits to hopefully simplify the review.

The first commit removes `TTI::Model`. This is done by deriving
`TargetTransformInfoImplBase` from `TTI::Concept`. This is possible
because they implement the same set of interfaces with identical
signatures.

The first commit makes `TargetTransformImplBase` polymorphic, which
means all derived classes should `override` its methods. This is done in
second commit to make the first one smaller. It appeared infeasible to
extract this into a separate PR because the first commit landed
separately would result in tons of `-Woverloaded-virtual` warnings (and
break `-Werror` builds).

The third commit eliminates `TTI::Concept` by merging it with the only
derived class `TargetTransformImplBase`. This commit could be extracted
into a separate PR, but it touches the same lines in
`TargetTransformInfoImpl.h` (removes `override` added by the second
commit and adds `virtual`), so I thought it may make sense to land these
two commits together.

Pull Request: https://github.com/llvm/llvm-project/pull/136674
2025-04-26 15:25:40 +03:00
Rahul Joshi
1356e202b2
[NFC][LLVM][BPF] Cleanup pass initialization for BPF (#134414)
- Remove calls to pass initialization from pass constructors and move
them to target initialization.
- https://github.com/llvm/llvm-project/issues/111767
2025-04-07 17:27:26 -07:00
Kazu Hirata
ed8019d9fb
[Target] Remove unused includes (NFC) (#116577)
Identified with misc-include-cleaner.
2024-11-18 07:19:50 -08:00
Matin Raayai
bb3f5e1fed
Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234)
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.

cc @arsenm @aeubanks
2024-11-14 13:30:05 -08:00
Shilei Tian
dc45ff1d2a
[PassBuilder] Add ThinOrFullLTOPhase to early simplication EP call backs (#114547)
The early simplication pipeline is used in non-LTO and (Thin/Full)LTO
pre-link
stage. There are some passes that we want them in non-LTO mode, but not
at LTO
pre-link stage. The control is missing currently. This PR adds the
support. To
demonstrate the use, we only enable the internalization pass in non-LTO
mode for
AMDGPU because having it run in pre-link stage causes some issues.
2024-11-03 23:24:10 -05:00
yonghong-song
06c531e808
BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 (#106494)
This patch contains two pars:
- first to revert the patch https://github.com/llvm/llvm-project/pull/101428.
- second to remove `atomic_fetch_and_*()` to `atomic_<op>()`
  conversion (when return value is not used), but preserve 
  `__sync_fetch_and_add()` to locked insn with cpu v1/v2.
2024-08-30 14:00:33 -07:00
yonghong-song
c566769d7c
BPF: Ensure __sync_fetch_and_add() always generate atomic_fetch_add insn (#101428)
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...)
without return value like
  __sync_fetch_and_add(&foo, 1);
llvm BPF backend generates locked insn e.g.
  lock *(u32 *)(r1 + 0) += r2

If __sync_fetch_and_add(...) returns a value like
  res = __sync_fetch_and_add(&foo, 1);
llvm BPF backend generates like
  r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)

The above generation of 'lock *(u32 *)(r1 + 0) += r2' caused a problem
in jit since proper barrier is not inserted.

The above discrepancy is due to commit [2] where it tries to maintain
backward compatability since before commit [2],
__sync_fetch_and_add(...) generates lock insn in BPF backend.

Based on discussion in [1], now it is time to fix the above discrepancy
so we can have proper barrier support in jit. This patch made sure that
__sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns.
Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. I
also removed the whole BPFMIChecking.cpp file whose original purpose is
to detect and issue errors if XADD{W,D,W32} may return a value used
subsequently. Since insns XADD{W,D,W32} are all inline asm only now,
such error detection is not needed.

[1]
https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f
[2]
286daafd65

Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
2024-08-04 21:03:16 -07:00
Nikita Popov
5cd0ba30f5
Reapply [IR] Lazily initialize the class to pass name mapping (NFC) (#96321) (#96462)
On MSVC the `this` uses inside `decltype` require a lambda capture. On
clang they result in an unused capture warning instead. Add the capture
and suppress the warning with `(void)this`.

-----

Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.

This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.

This way there is no compile-time impact if the mapping is not used.
2024-06-24 15:00:11 +02:00
Nikita Popov
e5a41f0afc Revert "[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)"
My attempt to fix the Windows build made things worse,
revert entirely for now.

This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c.
This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5.
This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.
2024-06-24 10:32:03 +02:00
Nikita Popov
957dc4366d
[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)
Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.

This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.

This way there is no compile-time impact if the mapping is not used.
2024-06-24 09:40:09 +02:00
paperchalice
7652a59407
Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149)
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
2024-06-04 08:10:58 +08:00
paperchalice
8917afaf0e
Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146)
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1

It still breaks EXPENSIVE_CHECKS build. Sorry.
2024-06-02 14:31:52 +08:00
paperchalice
d2cdc8ab45
[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567)
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
2024-06-02 09:12:33 +08:00
paperchalice
2aa5bae0c0
[NewPM][BPF] Add BPFPassRegistry.def NFCI (#86241)
Prepare migration for dag-isel.
2024-03-23 12:53:26 +08:00
4ast
2aacb56e83
BPF address space insn (#84410)
This commit aims to support BPF arena kernel side
[feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/):
- arena is a memory region accessible from both BPF program and
userspace;
- base pointers for this memory region differ between kernel and user
spaces;
- `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)`
translates src_reg, a pointer in src_addr_space to dst_reg, equivalent
pointer in dst_addr_space, {src,dst}_addr_space are immediate constants;
- number 0 is assigned to kernel address space;
- number 1 is assigned to user address space.

On the LLVM side, the goal is to make load and store operations on arena
pointers "transparent" for BPF programs:
- assume that pointers with non-zero address space are pointers to
  arena memory;
- assume that arena is identified by address space number;
- assume that address space zero corresponds to kernel address space;
- assume that every BPF-side load or store from arena is done via
pointer in user address space, thus convert base pointers using
`addr_space_cast(src_reg, 0, 1)`;

Only load, store, cmpxchg and atomicrmw IR instructions are handled by
this transformation.

For example, the following C code:

```c
   #define __as __attribute__((address_space(1)))
   void copy(int __as *from, int __as *to) { *to = *from; }
```

Compiled to the following IR:

```llvm
    define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) {
    entry:
      %0 = load i32, ptr addrspace(1) %from, align 4
      store i32 %0, ptr addrspace(1) %to, align 4
      ret void
    }
```

Is transformed to:

```llvm
    %to2 = addrspacecast ptr addrspace(1) %to to ptr     ;; !
    %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; !
    %0 = load i32, ptr %from1, align 4, !tbaa !3
    store i32 %0, ptr %to2, align 4, !tbaa !3
    ret void
```

And compiled as:

```asm
    r2 = addr_space_cast(r2, 0, 1)
    r1 = addr_space_cast(r1, 0, 1)
    r1 = *(u32 *)(r1 + 0)
    *(u32 *)(r2 + 0) = r1
    exit
```

Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>
2024-03-13 02:27:25 +02:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
James Y Knight
b856e77b2d
Set MaxAtomicSizeInBitsSupported for remaining targets. (#75703)
Targets affected:

- NVPTX and BPF: set to 64 bits.
- ARC, Lanai, and MSP430: set to 0 (they don't implement atomics).

Those which didn't yet add AtomicExpandPass to their pass pipeline now
do so.

This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass. On all these targets, this
now matches what Clang already does in the frontend.

The only targets which do not configure AtomicExpandPass now are:
- DirectX and SPIRV: they aren't normal backends.
- AVR: a single-cpu architecture with no privileged/user divide, which
could implement all atomics by disabling/enabling interrupts, regardless
of size/alignment. Will be addressed by future work.
2024-01-08 22:34:28 -05:00
paperchalice
ffb1f20e0d
[CodeGen] Add flag to populate target pass names (#76328)
`print-pipeline-passes` can show target pass names.
2024-01-03 09:07:02 +08:00
Yingchi Long
2460bf2fac
[BPF][GlobalISel] add initial gisel support for BPF (#74999)
This adds initial codegen support for BPF backend.

Only implemented ir-translator for "RET" (but not support isel).

Depends on: #74998
2023-12-11 19:58:34 +08:00
Eduard Zingerman
030b8cb156 [BPF] Attribute preserve_static_offset for structs
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
  updating the field offset;

  BPF verifier limits access patterns allowed for certain data
  types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
  types only `LD/ST <reg> <static-offset>` memory loads and stores are
  allowed.

  This is so because offsets of the fields of these structures do not
  match real offsets in the running kernel. During BPF program
  load/verification loads and stores to the fields of these types are
  rewritten so that offsets match real offsets. For this rewrite to
  happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
  kernel source tree for details.

- runtime environment might disallow access to the field of the type
  through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for
  register values. In case if register with such tag is modified BPF
  programs are not allowed to read or write memory using register. See
  kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
  source tree for details.

Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

This is a second attempt to commit this change, previous reverted
commit is: cb13e9286b6d4e384b5d4203e853d44e2eff0f0f.
The following items had been fixed:
- test case bpf-preserve-static-offset-bitfield.c now uses
  `-triple bpfel` to avoid different codegen for little/big endian
  targets.
- BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid
  use after free for `WorkList` elements `V`.

Differential Revision: https://reviews.llvm.org/D133361
2023-12-05 19:21:42 +02:00
Eduard Zingerman
2484469803 Revert "[BPF] Attribute preserve_static_offset for structs"
This reverts commit cb13e9286b6d4e384b5d4203e853d44e2eff0f0f.
Buildbot reports MSAN failures in tests added in this commit:
https://lab.llvm.org/buildbot/#/builders/5/builds/38806

Failing tests:
  LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll
2023-11-30 22:29:45 +02:00
Eduard Zingerman
cb13e9286b [BPF] Attribute preserve_static_offset for structs
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
  updating the field offset;

  BPF verifier limits access patterns allowed for certain data
  types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
  types only `LD/ST <reg> <static-offset>` memory loads and stores are
  allowed.

  This is so because offsets of the fields of these structures do not
  match real offsets in the running kernel. During BPF program
  load/verification loads and stores to the fields of these types are
  rewritten so that offsets match real offsets. For this rewrite to
  happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
  kernel source tree for details.

- runtime environment might disallow access to the field of the type
  through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for
  register values. In case if register with such tag is modified BPF
  programs are not allowed to read or write memory using register. See
  kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
  source tree for details.

Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

Differential Revision: https://reviews.llvm.org/D133361
2023-11-30 19:45:03 +02:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Eduard Zingerman
651e644595 [BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree()
Replace `BPFMIPeepholeTruncElim` by adding an overload for
`TargetLowering::isZExtFree()` aware that zero extension is
free for `ISD::LOAD`.

Short description
=================

The `BPFMIPeepholeTruncElim` handles two patterns:

Pattern #1:

    %1 = LDB %0, ...              %1 = LDB %0, ...
    %2 = AND_ri %1, 0xff      ->  %2 = MOV_ri %1    <-- (!)

Pattern #2:

    bb.1:                         bb.1:
      %a = LDB %0, ...              %a = LDB %0, ...
      br %bb3                       br %bb3
    bb.2:                         bb.2:
      %b = LDB %0, ...        ->    %b = LDB %0, ...
      br %bb3                       br %bb3
    bb.3:                         bb.3:
      %1 = PHI %a, %b               %1 = PHI %a, %b
      %2 = AND_ri %1, 0xff          %2 = MOV_ri %1  <-- (!)

Plus variations:
- AND_ri_32 instead of AND_ri
- SLL/SLR instead of AND_ri
- LDH, LDW, LDB32, LDH32, LDW32

Both patterns could be handled by built-in transformations at
instruction selection phase if suitable `isZExtFree()` implementation
is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`.

When evaluating on BPF kernel selftests and remove_truncate_*.ll LLVM
test cases this revisions performs slightly better than
BPFMIPeepholeTruncElim, see "Impact" section below for details.

Commit also adds a few test cases to make sure that patterns in
question are handled.

Long description
================

Why this works: Pattern #1
--------------------------

Consider the following example:

    define i1 @foo(ptr %p) {
    entry:
      %a = load i8, ptr %p, align 1
      %cond = icmp eq i8 %a, 0
      ret i1 %cond
    }

Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command:

    ...
    Type-legalized selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64
          t19: i64 = and t16, Constant:i64<255>
        t17: i64 = setcc t19, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Replacing.1 t19: i64 = and t16, Constant:i64<255>
    With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64
     and 0 other values
    ...
    Optimized type-legalized selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 11 nodes:
      t0: ch,glue = EntryToken
            t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
        t17: i64 = setcc t20, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...

Note:
- Optimized type-legalized selection DAG:
  - `t19 = and t16, 255` had been replaced by `t16` (load).
  - Patterns like `(and (load ... i8), 255)` are replaced by `load`
    in `DAGCombiner::BackwardsPropagateMask` called from
    `DAGCombiner::visitAND`.
  - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by
    `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge,
    look for `TLI.shouldFoldConstantShiftPairToMask()` call).

Why this works: Pattern #2
--------------------------

Consider the following example:

    define i1 @foo(ptr %p) {
    entry:
      %a = load i8, ptr %p, align 1
      br label %next

    next:
      %cond = icmp eq i8 %a, 0
      ret i1 %cond
    }

Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command.
Log for first basic block:

    Initial selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 9 nodes:
      t0: ch,glue = EntryToken
      t3: i64 = Constant<0>
            t2: i64,ch = CopyFromReg t0, Register:i64 %1
          t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64
        t6: i64 = zero_extend t5
      t8: ch = CopyToReg t0, Register:i64 %0, t6
    ...
    Replacing.1 t6: i64 = zero_extend t5
    With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
     and 0 other values
    ...
    Optimized lowered selection DAG: %bb.0 'foo:entry'
    SelectionDAG has 7 nodes:
      t0: ch,glue = EntryToken
          t2: i64,ch = CopyFromReg t0, Register:i64 %1
        t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
      t8: ch = CopyToReg t0, Register:i64 %0, t9

Note:
- Initial selection DAG:
  - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))`
    w/o special `isZExtFree()` overload added by this commit
    it is instead lowered as `t6 = (any_extend (load ...))`.
  - The decision to generate `zero_extend` or `any_extend` is
    done in `RegsForValue::getCopyToRegs` called from
    `SelectionDAGBuilder::CopyValueToVirtualRegister`:
    - if `isZExtFree()` for load returns true `zero_extend` is used;
    - `any_extend` is used otherwise.
- Optimized lowered selection DAG:
  - `t6 = (any_extend (load ...))` is replaced by
    `t9 = load ..., zext from i8`
    This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from
    `DAGCombiner::visitZERO_EXTEND`.

Log for second basic block:

    Initial selection DAG: %bb.1 'foo:next'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
                t2: i64,ch = CopyFromReg t0, Register:i64 %0
              t4: i64 = AssertZext t2, ValueType:ch:i8
            t5: i8 = truncate t4
          t8: i1 = setcc t5, Constant:i8<0>, seteq:ch
        t9: i64 = any_extend t8
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Replacing.2 t18: i64 = and t4, Constant:i64<255>
    With: t4: i64 = AssertZext t2, ValueType:ch:i8
    ...
    Type-legalized selection DAG: %bb.1 'foo:next'
    SelectionDAG has 13 nodes:
      t0: ch,glue = EntryToken
              t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t4: i64 = AssertZext t2, ValueType:ch:i8
          t18: i64 = and t4, Constant:i64<255>
        t16: i64 = setcc t18, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...
    Optimized type-legalized selection DAG: %bb.1 'foo:next'
    SelectionDAG has 11 nodes:
      t0: ch,glue = EntryToken
            t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t4: i64 = AssertZext t2, ValueType:ch:i8
        t16: i64 = setcc t4, Constant:i64<0>, seteq:ch
      t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16
      t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
    ...

Note:
- Initial selection DAG:
  - `t0` is an input value for this basic block, it corresponds load
    instruction (`t9`) from the first basic block.
  - It is accessed within basic block via
    `t4` (AssertZext (CopyFromReg t0, ...)).
  - The `AssertZext` is generated by RegsForValue::getCopyFromRegs
    called from SelectionDAGBuilder::getCopyFromRegs, it is generated
    only when `LiveOutInfo` with known number of leading zeros is
    present for `t0`.
  - Known register bits in `LiveOutInfo` are computed by
    `SelectionDAG::computeKnownBits` called from
    `SelectionDAGISel::ComputeLiveOutVRegInfo`.
  - `computeKnownBits()` generates leading zeros information for
    `(load ..., zext from ...)` but *does not* generate leading zeros
    information for `(load ..., anyext from ...)`.
    This is why `isZExtFree()` added in this commit is important.
- Type-legalized selection DAG:
  - `t5 = truncate t4` is replaced by `t18 = and t4, 255`
- Optimized type-legalized selection DAG:
  - `t18 = and t4, 255` is replaced by `t4`, this is done by
    `DAGCombiner::SimplifyDemandedBits` called from
    `DAGCombiner::visitAND`, which simplifies patterns like
    `(and (assertzext ...))`

Impact
------

This change covers all remove_truncate_*.ll test cases:
- for -mcpu=v4 there are no changes in the generated code;
- for -mcpu=v2 code generated for remove_truncate_7 and
  remove_truncate_8 improved slightly, for other tests it is
  unchanged.

For remove_truncate_7:

    Before this revision                 After this revision
    --------------------                 -------------------
        r1 <<= 0x20                          r1 <<= 0x20
        r1 >>= 0x20                          r1 >>= 0x20
        if r1 == 0x0 goto +0x2 <LBB0_2>      if r1 == 0x0 goto +0x2 <LBB0_2>
        r1 = *(u32 *)(r2 + 0x0)              r0 = *(u32 *)(r2 + 0x0)
        goto +0x1 <LBB0_3>                   goto +0x1 <LBB0_3>
    <LBB0_2>:                            <LBB0_2>:
        r1 = *(u32 *)(r2 + 0x4)              r0 = *(u32 *)(r2 + 0x4)
    <LBB0_3>:                            <LBB0_3>:
        r0 = r1                              exit
        exit

For remove_truncate_8:

    Before this revision                 After this revision
    --------------------                 -------------------
        r2 = *(u32 *)(r1 + 0x0)              r2 = *(u32 *)(r1 + 0x0)
        r3 = r2                              r3 = r2
        r3 <<= 0x20                          r3 <<= 0x20
        r4 = r3                              r3 s>>= 0x20
        r4 s>>= 0x20
        if r4 s> 0x2 goto +0x5 <LBB0_3>      if r3 s> 0x2 goto +0x4 <LBB0_3>
        r4 = *(u32 *)(r1 + 0x4)              r3 = *(u32 *)(r1 + 0x4)
        r3 >>= 0x20
        if r3 >= r4 goto +0x2 <LBB0_3>       if r2 >= r3 goto +0x2 <LBB0_3>
        r2 += 0x2                            r2 += 0x2
        *(u32 *)(r1 + 0x0) = r2              *(u32 *)(r1 + 0x0) = r2
    <LBB0_3>:                            <LBB0_3>:
        r0 = 0x3                             r0 = 0x3
        exit                                 exit

For kernel BPF selftests statistics is as follows: (-mcpu=v4):
- For -mcpu=v4: 9 out of 655 object files have differences,
  in all cases total number of instructions marginally decreased
  (-27 instructions).
- For -mcpu=v2: 9 out of 655 object files have differences:
  - For 19 object files number of instruction decreased
    (-129 instruction in total): some redundant `rX &= 0xffff`
    and register to register assignments removed;
  - For 2 object files number of instructions increased +2
    instructions in each file.

Both -mcpu=v2 instruction increases could be reduced to the same
example:

    define void @foo(ptr %p) {
    entry:
      %a = load i32, ptr %p, align 4
      %b = sext i32 %a to i64
      %c = icmp ult i64 1, %b
      br i1 %c, label %next, label %end

    next:
      call void inttoptr (i64 62 to ptr)(i32 %a)
      br label %end

    end:
      ret void
    }

Note that this example uses value loaded to `%a` both as a sign
extended (`%b`) and as zero extended (`%a` passed as parameter).
Here is the difference in final assembly code:

    Before this revision          After this revision
    --------------------          -------------------
        r1 = *(u32 *)(r1 + 0)         r1 = *(u32 *)(r1 + 0)
        r1 <<= 32                     r1 <<= 32
        r1 s>>= 32                    r1 s>>= 32
        if r1 < 2 goto <LBB0_2>       if r1 < 2 goto <LBB0_2>
                                      r1 <<= 32
                                      r1 >>= 32
        call 62                       call 62
    <LBB0_2>:                     <LBB0_2>:
        exit                          exit

Before this commit `%a` is passed to call as a sign extended value,
after this commit `%a` is passed to call as a zero extended value,
both are correct as 32-bit sub-register is the same.

The difference comes from `DAGCombiner` operation on the initial DAG:

Initial selection DAG before this commit:

    t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
          t6: i64 = any_extend t5         <--------------------- (1)
        t8: ch = CopyToReg t0, Register:i64 %0, t6
            t9: i64 = sign_extend t5
          t12: i1 = setcc Constant:i64<1>, t9, setult:ch

Initial selection DAG after this commit:

    t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
          t6: i64 = zero_extend t5        <--------------------- (2)
        t8: ch = CopyToReg t0, Register:i64 %0, t6
            t9: i64 = sign_extend t5
          t12: i1 = setcc Constant:i64<1>, t9, setult:ch

The node `t9` is processed before node `t6` and `load` instruction is
combined to load with sign extension:

    Replacing.1 t9: i64 = sign_extend t5
    With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64
     and 0 other values
    Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
    With: t31: i32 = truncate t30
     and 1 other values

This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from
`DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which
is `any_extend` in (1) and `zero_extend` in (2).
`tryToFoldExtOfLoad()` rewrites such uses of `t5` differently:
- `any_extend` is simply removed
- `zero_extend` is replaced by `and t30, 0xffffffff`, which is later
  converted to a pair of shifts. This pair of shifts survives till the
  end of translation.

Differential Revision: https://reviews.llvm.org/D157870
2023-08-22 00:04:51 +03:00
Fangrui Song
2a61ceddb3 [BPF] Remove unused legacy passes after TargetMachine::adjustPassManager removal
D137796 made these passes unused.

`opt --bpf-ir-peephole` is specified in one test. Add a `registerPipelineParsingCallback`
so that we can use change the test to use `opt --passes=bpf-ir-peephole` instead.
2023-06-24 22:44:06 -07:00
Bjorn Pettersson
2dd221fe48 Remove no longer needed includes of LegacyPassManager.h
Most of the removed includes should probably have been removed already
when we removed TargetMachine::adjustPassManager.
2023-02-06 13:38:57 +01:00
Nick Desaulniers
19a004b468 [llvm][SelectionDAGISel] support -{start|stop}-{before|after}= for remaining targets
Follow up to the series:
1. https://reviews.llvm.org/D140161
2. https://reviews.llvm.org/D140349
3. https://reviews.llvm.org/D140331
4. https://reviews.llvm.org/D140323

Completes the work from the previous two for remaining targets.

This creates the following named passes that can be run via
`llc -{start|stop}-{before|after}`:
- arc-isel
- arm-isel
- avr-isel
- bpf-isel
- csky-isel
- hexagon-isel
- lanai-isel
- loongarch-isel
- m68k-isel
- msp430-isel
- mips-isel
- nvptx-isel
- ppc-codegen
- riscv-isel
- sparc-isel
- systemz-isel
- ve-isel
- wasm-isel
- xcore-isel

A nice way to write tests for SelectionDAGISel might be to use a RUN:
line like:
llc -mtriple=<triple> -start-before=<arch>-isel -stop-after=finalize-isel -o -

Fixes: https://github.com/llvm/llvm-project/issues/59538

Reviewed By: asb, zixuan-wu

Differential Revision: https://reviews.llvm.org/D140364
2022-12-21 13:25:15 -08:00
Fangrui Song
bac974278c CodeGen/CommandFlags: Convert Optional to std::optional 2022-12-03 18:38:12 +00:00
Krzysztof Parzyszek
8c7c20f033 Convert Optional<CodeModel> to std::optional<CodeModel> 2022-12-03 12:08:47 -06:00
Bjorn Pettersson
99c47d9e31 Remove TargetMachine::adjustPassManager
Since opt no longer supports to run default (O0/O1/O2/O3/Os/Oz)
pipelines using the legacy PM, there are no in-tree uses of
TargetMachine::adjustPassManager remaining. This patch removes the
no longer used adjustPassManager functions.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D137796
2022-11-28 10:24:16 +01:00
Kazu Hirata
129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Jameson Nash
c4b1a63a1b mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D120518
2022-02-25 14:30:44 -05:00
Yonghong Song
009f3a89d8 BPF: remove intrindics @llvm.stacksave() and @llvm.stackrestore()
Paul Chaignon reported a bpf verifier failure ([1]) due to using
non-ABI register R11. For the test case, llvm11 is okay while
llvm12 and later generates verifier unfriendly code.

The failure is related to variable length array size.
The following mimics the variable length array definition
in the test case:

struct t { char a[20]; };
void foo(void *);
int test() {
   const int a = 8;
   char tmp[AA + sizeof(struct t) + a];
   foo(tmp);
   ...
}

Paul helped bisect that the following llvm commit is
responsible:

552c6c232872 ("PR44406: Follow behavior of array bound constant
              folding in more recent versions of GCC.")

Basically, before the above commit, clang frontend did constant
folding for array size "AA + sizeof(struct t) + a" to be 68,
so used alloca for stack allocation. After the above commit,
clang frontend didn't do constant folding for array size
any more, which results in a VLA and llvm.stacksave/llvm.stackrestore
is generated.

BPF architecture API does not support stack pointer (sp) register.
The LLVM internally used R11 to indicate sp register but it should
not be in the final code. Otherwise, kernel verifier will reject it.

The early patch ([2]) tried to fix the issue in clang frontend.
But the upstream discussion considered frontend fix is really a
hack and the backend should properly undo llvm.stacksave/llvm.stackrestore.
This patch implemented a bpf IR phase to remove these intrinsics
unconditionally. If eventually the alloca can be resolved with
constant size, r11 will not be generated. If alloca cannot be
resolved with constant size, SelectionDag will complain, the same
as without this patch.

 [1] https://lore.kernel.org/bpf/20210809151202.GB1012999@Mem/
 [2] https://reviews.llvm.org/D107882

Differential Revision: https://reviews.llvm.org/D111897
2021-10-18 09:51:19 -07:00
Reid Kleckner
89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Tarindu Jayatilaka
7a797b2902 Take OptimizationLevel class out of Pass Builder
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D107025
2021-07-29 21:57:23 -07:00
Arthur Eubanks
34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
Yonghong Song
a260ae7160 BPF: Implement TTI.IntImmCost() properly
This patch implemented TTI.IntImmCost() properly.
Each BPF insn has 32bit immediate space, so for any immediate
which can be represented as 32bit signed int, the cost
is technically free. If an int cannot be presented as
a 32bit signed int, a ld_imm64 instruction is needed
and a TCC_Basic is returned.

This change is motivated when we observed that
several bpf selftests failed with latest llvm trunk, e.g.,
  #10/16 strobemeta.o:FAIL
  #10/17 strobemeta_nounroll1.o:FAIL
  #10/18 strobemeta_nounroll2.o:FAIL
  #10/19 strobemeta_subprogs.o:FAIL
  #96 snprintf_btf:FAIL

The reason of the failure is due to that
SpeculateAroundPHIsPass did aggressive transformation
which alters control flow for which currently verifer
cannot handle well. In llvm12, SpeculateAroundPHIsPass
is not called.

SpeculateAroundPHIsPass relied on TTI.getIntImmCost()
and TTI.getIntImmCostInst() for profitability
analysis. This patch implemented TTI.getIntImmCost()
properly for BPF backend which also prevented
transformation which caused the above test failures.

Differential Revision: https://reviews.llvm.org/D96448
2021-02-11 08:35:25 -08:00
Kazu Hirata
8a20e2b3d3 [llvm] Use Optional::getValueOr (NFC) 2021-01-12 21:43:50 -08:00
Arthur Eubanks
92a67e131f [BPF][NewPM] Port bpf-adjust-opt to NPM and add it to pipeline
Reviewed By: yonghong-song

Differential Revision: https://reviews.llvm.org/D91990
2020-11-26 10:11:26 -08:00
Arthur Eubanks
ab0ddbc38a Reland [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 13:11:40 -08:00
Arthur Eubanks
9173b5a99d Revert "[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback"
This reverts commit 7a83aa0520d24ee5285a9c60b97b57a1db1d65e8.

Causing buildbot failures.
2020-11-04 12:57:32 -08:00
Arthur Eubanks
7a83aa0520 [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 12:53:30 -08:00
Yonghong Song
ddf1864ace BPF: add AdjustOpt IR pass to generate verifier friendly codes
Add an IR phase right before main module optimization.
This is to modify IR to restrict certain downward optimizations
in order to generate verifier friendly code.
  > prevent certain instcombine optimizations, handling both
    in-block/cross-block instcombines.
  > avoid speculative code motion if the variable used in
    condition is also used in the later blocks.

Internally, a bpf IR builtin
  result = __builtin_bpf_passthrough(seq_num, result)
is used to enforce ordering. This builtin is only used
during target independent IR optimizations and it will
be removed at the beginning of target dependent IR
optimizations.

For example, removing the following workaround,
  --- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
  +++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
  @@ -47,7 +47,7 @@ int sysctl_tcp_mem(struct bpf_sysctl *ctx)
          /* a workaround to prevent compiler from generating
           * codes verifier cannot handle yet.
           */
  -       volatile int ret;
  +       int ret;
this patch is able to generate code which passed the verifier.

To disable optimization, users need to use "opt" command like below:
  clang -target bpf -O2 -S -emit-llvm -Xclang -disable-llvm-passes test.c
  // disable icmp serialization
  opt -O2 -bpf-disable-serialize-icmp test.ll | llvm-dis > t.ll
  // disable avoid-speculation
  opt -O2 -bpf-disable-avoid-speculation test.ll | llvm-dis > t.ll
  llc t.ll

Differential Revision: https://reviews.llvm.org/D85570
2020-10-07 08:49:10 -07:00
Arthur Eubanks
40251fee00 [BPF][NewPM] Make BPFTargetMachine properly adjust NPM optimizer pipeline
This involves porting BPFAbstractMemberAccess and BPFPreserveDIType to
NPM, then adding them BPFTargetMachine::registerPassBuilderCallbacks
(the NPM equivalent of adjustPassManager()).

Reviewed By: yonghong-song, asbirlea

Differential Revision: https://reviews.llvm.org/D88855
2020-10-06 07:42:32 -07:00
Yonghong Song
54d9f743c8 BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible
Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.

Currently, compiler may transform the above code
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
    p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
    bpf_probe_read(buf, buf_size, p2);
  }
to
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    bpf_probe_read(buf, buf_size, p2);
  }
and eventually assembly code looks like
  reloc_exist = 1;
  reloc_member_offset = 10; //calculate member offset from base
  p2 = base + reloc_member_offset;
  if (reloc_exist) {
    bpf_probe_read(bpf, buf_size, p2);
  }
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.

This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
  %6 = load @llvm.sk_buff:0:50$0:0:0:2:0
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.

But this transformation has another consequence, code sinking
may happen like below:
  PHI = <possibly different @preserve_*_access_globals>
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6

For such cases, we will not able to generate relocations since
multiple relocations are merged into one.

This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.

A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
  - report fatal error if any reloc global in PHI nodes
  - remove all bpf passthrough builtin functions

Changes for existing CORE tests:
  - for clang tests, add "-Xclang -disable-llvm-passes" flags to
    avoid builtin->reloc_global transformation so the test is still
    able to check correctness for clang generated IR.
  - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
    before "llc" command since "opt" is needed to call newly-placed
    builtin->reloc_global transformation. Add target triple in the IR
    file since "opt" requires it.
  - Since target triple is added in IR file, if a test may produce
    different results for different endianness, two tests will be
    created, one for bpfeb and another for bpfel, e.g., some tests
    for relocation of lshift/rshift of bitfields.
  - field-reloc-bitfield-1.ll has different relocations compared to
    old codes. This is because for the structure in the test,
    new code returns struct layout alignment 4 while old code
    is 8. Align 8 is more precise and permits double load. With align 4,
    the new mechanism uses 4-byte load, so generating different
    relocations.
  - test intrinsic-transforms.ll is removed. This is used to test
    cse on intrinsics so we do not lose metadata. Now metadata is attached
    to global and not instruction, it won't get lost with cse.

Differential Revision: https://reviews.llvm.org/D87153
2020-09-28 16:56:22 -07:00
Yonghong Song
87cba43402 BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization
The following bpf linux kernel selftest failed with latest
llvm:
  $ ./test_progs -n 7/10
  ...
  The sequence of 8193 jumps is too complex.
  verification time 126272 usec
  stack depth 320
  processed 114799 insns (limit 1000000)
  ...
  libbpf: failed to load object 'pyperf600_nounroll.o'
  test_bpf_verif_scale:FAIL:110
  #7/10 pyperf600_nounroll.o:FAIL
  #7 bpf_verif_scale:FAIL

After some investigation, I found the following llvm patch
  https://reviews.llvm.org/D84108
is responsible. The patch disabled hoisting common instructions
in SimplifyCFG by default. Later on, the code changes and a
SimplifyCFG phase with hoisting on cannot do the work any more.

A test is provided to demonstrate the problem.
The IR before simplifyCFG looks like:
  for.cond:
    %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
    %cmp = icmp ult i32 %i.0, 6
    br i1 %cmp, label %for.body, label %for.cond.cleanup

  for.cond.cleanup:
    %2 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
    %cmp2 = icmp eq i8* %2, null
    %conv = zext i1 %cmp2 to i32
    call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3
    call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3
    ret i32 %conv

  for.body:
    %3 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
    %tobool.not = icmp eq i8* %3, null
    br i1 %tobool.not, label %for.inc, label %land.lhs.true

The first two insns of `for.cond.cleanup` and `for.body`, load and
icmp, can be hoisted to `for.cond` block. With Patch D84108, the
optimization is delayed. But unfortunately, later on loop rotation
added addition phi nodes to `for.body` and hoisting cannot
be done any more.

Note such a hoisting is beneficial to bpf programs as
bpf verifier does path sensitive analysis and verification.
The hoisting preverts reloading from stack which will assume
conservative value and increase exploited insns. In this case,
it caused verifier failure.

To fix this problem, I added an IR pass from bpf target
to performance additional simplifycfg with hoisting common inst
enabled.

Differential Revision: https://reviews.llvm.org/D85434
2020-08-06 13:16:00 -07:00
Yonghong Song
6b01b46538 [BPF] preserve debuginfo types for builtin __builtin__btf_type_id()
The builtin function
  u32 btf_type_id = __builtin_btf_type_id(param, 0)
can help preserve type info for the following use case:
  extern void foo(..., void *data, int size);
  int test(...) {
    struct t { int a; int b; int c; } d;
    d.a = ...; d.b = ...; d.c = ...;
    foo(..., &d, sizeof(d));
  }

The function "foo" in the above only see raw data and does not
know what type of the data is. In certain cases, e.g., logging,
the additional type information will help pretty print.

This patch handles the builtin in BPF backend. It includes
an IR pass to translate the IR intrinsic to a load of
a global variable which carries the metadata, and an MI
pass to remove the intermediate load of the global variable.
Finally, in AsmPrinter pass, proper instruction are generated.

In the above example, the second argument for __builtin_btf_type_id()
is 0, which means a relocation for local adjustment,
i.e., w.r.t. bpf program BTF change,  will be generated.
The value 1 for the second argument means
a relocation for remote adjustment, e.g., against vmlinux.

Differential Revision: https://reviews.llvm.org/D74572
2020-05-15 08:00:44 -07:00