llvm-project

Author	SHA1	Message	Date
Andrew Rogers	19658d1474	[llvm] annotate interfaces in llvm/Target for DLL export (#143615 ) ## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang	2025-06-17 13:28:45 -07:00
yonghong-song	ab391beb11	[BPF] Handle traps with kfunc call __bpf_trap (#131731 ) Currently, middle-end generates 'unreachable' insn if the compiler feels the code is indeed unreachable or the code becomes invalid due to some optimizaiton (e.g. code optimization with uninitialized variables). Right now BPF backend ignores 'unreachable' insn during selectiondag lowering. For cases where 'unreachable' is due to invalid code transformation, such a signal will be missed. Later on, users needs some effort to debug it which impacts developer productivity. This patch enabled selectiondag lowering for 'unreachable' insn. Previous attempt ([1]) tries to have a backend IR pass to filter out 'unreachable' insns in a number of cases. But such pattern matching may misalign with future middle-end optimization with 'unreachable' insns. This patch takes a different approach. The 'unreachable' insn is lowered with special encoding in bpf object file and verifier will do proper verification for the bpf prog. More specifically, the 'unreachable' insn is replaced by a __bpf_trap() function. This function will be a kfunc (in ".ksyms" section) with a weak attribute, but does not have definition. The actual kfunc definition is expected to be in kernel. The __bpf_trap() extern function is also encoded in BTF. The name __bpf_trap() is chosen to satisfy reserved identifier requirement. Besides the uninitialized variable case, the builtin function '__builtin_trap' can also generate kfunc __bpf_trap(). For example in [3], we have ``` # define __bpf_unreachable() __builtin_trap() ``` If the compiler didn't remove __builtin_trap() during middle-end optimization, compilation will fail. With this patch, compilation will not fail and __builtin_trap() is converted to __bpf_trap() kfunc. The eventual failure will be in verifier instead of llvm compilation. To keep compilation time failure, user can add an option like `-ftrap-function=<something>`. I tested this patch on bpf selftests and all tests are passed. I also tried original example in [2] and the code looks like below: ``` ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = (u32 )(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 10: 85 10 00 00 ff ff ff ff call -0x1 0000000000000050: R_BPF_64_32 __bpf_trap 11: 95 00 00 00 00 00 00 00 exit <END> ``` Eventually kernel verifier will emit the following logs: ``` 10: (85) call __bpf_trap#74479 unexpected __bpf_trap() due to uninitialized variable? ``` In another internal sched-ext bpf prog, with the patch we have bpf code: ``` Disassembly of section .text: 0000000000000000 <scx_storage_init_single>: ; { 0: bc 13 00 00 00 00 00 00 w3 = w1 1: b4 01 00 00 00 00 00 00 w1 = 0x0 ; const u32 zero = 0; ... 0000000000003a80 <create_dom>: ; { 1872: bc 16 00 00 00 00 00 00 w6 = w1 ; bpf_printk("dom_id %d", dom_id); 1873: 18 01 00 00 3f 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x3f ll 0000000000003a88: R_BPF_64_64 .rodata 1875: b4 02 00 00 0a 00 00 00 w2 = 0xa 1876: bc 63 00 00 00 00 00 00 w3 = w6 1877: 85 00 00 00 06 00 00 00 call 0x6 ; ret = scx_bpf_create_dsq(dom_id, 0); 1878: bc 61 00 00 00 00 00 00 w1 = w6 1879: b4 02 00 00 00 00 00 00 w2 = 0x0 1880: 85 10 00 00 ff ff ff ff call -0x1 0000000000003ac0: R_BPF_64_32 scx_bpf_create_dsq ; domc->node_cpumask = node_data[node_id]; 1881: 85 10 00 00 ff ff ff ff call -0x1 0000000000003ac8: R_BPF_64_32 __bpf_trap 1882: 95 00 00 00 00 00 00 00 exit <END> ``` The verifier can easily report the error too. A bpf flag `-bpf-disable-trap-unreachable` is introduced to disable trapping for 'unreachable' or __builtin_trap. [1] https://github.com/llvm/llvm-project/pull/126858 [2] https://github.com/msune/clang_bpf/blob/main/Makefile#L3 [3] https://github.com/libbpf/libbpf/blob/master/src/bpf_helpers.h	2025-05-27 13:34:15 -07:00
Matthias Braun	675cb70641	Register assembly printer passes (#138348 ) Register assembly printer passes in the pass registry. This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests. Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow targets to use the `INITIALIZE_PASS` macros and register the pass in the pass registry. This currently has a default parameter so it won't break any targets that have not been updated.	2025-05-06 18:01:17 -07:00
Sergei Barannikov	bb1765179e	[TTI] Simplify implementation (NFCI) (#136674 ) Replace "concept based polymorphism" with simpler PImpl idiom. This pursues two goals: * Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures. * Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. `getInstructionCost()`), it would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`, `TTIImplBase`, and target overrides. Now it is two less :) There are three commits to hopefully simplify the review. The first commit removes `TTI::Model`. This is done by deriving `TargetTransformInfoImplBase` from `TTI::Concept`. This is possible because they implement the same set of interfaces with identical signatures. The first commit makes `TargetTransformImplBase` polymorphic, which means all derived classes should `override` its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of `-Woverloaded-virtual` warnings (and break `-Werror` builds). The third commit eliminates `TTI::Concept` by merging it with the only derived class `TargetTransformImplBase`. This commit could be extracted into a separate PR, but it touches the same lines in `TargetTransformInfoImpl.h` (removes `override` added by the second commit and adds `virtual`), so I thought it may make sense to land these two commits together. Pull Request: https://github.com/llvm/llvm-project/pull/136674	2025-04-26 15:25:40 +03:00
Rahul Joshi	1356e202b2	[NFC][LLVM][BPF] Cleanup pass initialization for BPF (#134414 ) - Remove calls to pass initialization from pass constructors and move them to target initialization. - https://github.com/llvm/llvm-project/issues/111767	2025-04-07 17:27:26 -07:00
Kazu Hirata	ed8019d9fb	[Target] Remove unused includes (NFC) (#116577 ) Identified with misc-include-cleaner.	2024-11-18 07:19:50 -08:00
Matin Raayai	bb3f5e1fed	Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234 ) Following discussions in #110443, and the following earlier discussions in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html, https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine` interface classes. More specifically: 1. Makes `TargetMachine` the only class implemented under `TargetMachine.h` in the `Target` library. 2. `TargetMachine` contains target-specific interface functions that relate to IR/CodeGen/MC constructs, whereas before (at least on paper) it was supposed to have only IR/MC constructs. Any Target that doesn't want to use the independent code generator simply does not implement them, and returns either `false` or `nullptr`. 3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming aims to make the purpose of `LLVMTargetMachine` clearer. Its interface was moved under the CodeGen library, to further emphasis its usage in Targets that use CodeGen directly. 4. Makes `TargetMachine` the only interface used across LLVM and its projects. With these changes, `CodeGenCommonTMImpl` is simply a set of shared function implementations of `TargetMachine`, and CodeGen users don't need to static cast to `LLVMTargetMachine` every time they need a CodeGen-specific feature of the `TargetMachine`. 5. More importantly, does not change any requirements regarding library linking. cc @arsenm @aeubanks	2024-11-14 13:30:05 -08:00
Shilei Tian	dc45ff1d2a	[PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (#114547 ) The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues.	2024-11-03 23:24:10 -05:00
yonghong-song	06c531e808	BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 (#106494 ) This patch contains two pars: - first to revert the patch https://github.com/llvm/llvm-project/pull/101428. - second to remove `atomic_fetch_and_*()` to `atomic_<op>()` conversion (when return value is not used), but preserve `__sync_fetch_and_add()` to locked insn with cpu v1/v2.	2024-08-30 14:00:33 -07:00
yonghong-song	c566769d7c	BPF: Ensure __sync_fetch_and_add() always generate atomic_fetch_add insn (#101428 ) Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like __sync_fetch_and_add(&foo, 1); llvm BPF backend generates locked insn e.g. lock (u32 )(r1 + 0) += r2 If __sync_fetch_and_add(...) returns a value like res = __sync_fetch_and_add(&foo, 1); llvm BPF backend generates like r2 = atomic_fetch_add((u32 )(r1 + 0), r2) The above generation of 'lock (u32 )(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted. The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend. Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. This patch made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock (u32 *)(r1 + 0) += r2' can only be generated by inline asm. I also removed the whole BPFMIChecking.cpp file whose original purpose is to detect and issue errors if XADD{W,D,W32} may return a value used subsequently. Since insns XADD{W,D,W32} are all inline asm only now, such error detection is not needed. [1] https://lore.kernel.org/bpf/ZqqiQQWRnz7H93Hc@google.com/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] `286daafd65` Co-authored-by: Yonghong Song <yonghong.song@linux.dev>	2024-08-04 21:03:16 -07:00
Nikita Popov	5cd0ba30f5	Reapply [IR] Lazily initialize the class to pass name mapping (NFC) (#96321 ) (#96462 ) On MSVC the `this` uses inside `decltype` require a lambda capture. On clang they result in an unused capture warning instead. Add the capture and suppress the warning with `(void)this`. ----- Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.	2024-06-24 15:00:11 +02:00
Nikita Popov	e5a41f0afc	Revert "[IR] Lazily initialize the class to pass name mapping (NFC) (#96321 )" My attempt to fix the Windows build made things worse, revert entirely for now. This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c. This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5. This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.	2024-06-24 10:32:03 +02:00
Nikita Popov	957dc4366d	[IR] Lazily initialize the class to pass name mapping (NFC) (#96321 ) Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.	2024-06-24 09:40:09 +02:00
paperchalice	7652a59407	Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149 ) - Fix build with `EXPENSIVE_CHECKS` - Remove unused `PassName::ID` to resolve warning - Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly	2024-06-04 08:10:58 +08:00
paperchalice	8917afaf0e	Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146 ) This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to de37c06f01772e02465ccc9f538894c76d89a7a1 It still breaks EXPENSIVE_CHECKS build. Sorry.	2024-06-02 14:31:52 +08:00
paperchalice	d2cdc8ab45	[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567 ) Port selection dag isel to new pass manager. Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.	2024-06-02 09:12:33 +08:00
paperchalice	2aa5bae0c0	[NewPM][BPF] Add BPFPassRegistry.def NFCI (#86241 ) Prepare migration for dag-isel.	2024-03-23 12:53:26 +08:00
4ast	2aacb56e83	BPF address space insn (#84410 ) This commit aims to support BPF arena kernel side [feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/): - arena is a memory region accessible from both BPF program and userspace; - base pointers for this memory region differ between kernel and user spaces; - `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)` translates src_reg, a pointer in src_addr_space to dst_reg, equivalent pointer in dst_addr_space, {src,dst}_addr_space are immediate constants; - number 0 is assigned to kernel address space; - number 1 is assigned to user address space. On the LLVM side, the goal is to make load and store operations on arena pointers "transparent" for BPF programs: - assume that pointers with non-zero address space are pointers to arena memory; - assume that arena is identified by address space number; - assume that address space zero corresponds to kernel address space; - assume that every BPF-side load or store from arena is done via pointer in user address space, thus convert base pointers using `addr_space_cast(src_reg, 0, 1)`; Only load, store, cmpxchg and atomicrmw IR instructions are handled by this transformation. For example, the following C code: ```c #define __as __attribute__((address_space(1))) void copy(int __as from, int __as to) { to = from; } ``` Compiled to the following IR: ```llvm define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) { entry: %0 = load i32, ptr addrspace(1) %from, align 4 store i32 %0, ptr addrspace(1) %to, align 4 ret void } ``` Is transformed to: ```llvm %to2 = addrspacecast ptr addrspace(1) %to to ptr ;; ! %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; ! %0 = load i32, ptr %from1, align 4, !tbaa !3 store i32 %0, ptr %to2, align 4, !tbaa !3 ret void ``` And compiled as: ```asm r2 = addr_space_cast(r2, 0, 1) r1 = addr_space_cast(r1, 0, 1) r1 = (u32 )(r1 + 0) (u32 )(r2 + 0) = r1 exit ``` Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>	2024-03-13 02:27:25 +02:00
Rishabh Bali	fe42e72db2	[CodeGen] Port AtomicExpand to new Pass Manager (#71220 ) Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559	2024-02-25 18:42:22 +05:30
James Y Knight	b856e77b2d	Set MaxAtomicSizeInBitsSupported for remaining targets. (#75703 ) Targets affected: - NVPTX and BPF: set to 64 bits. - ARC, Lanai, and MSP430: set to 0 (they don't implement atomics). Those which didn't yet add AtomicExpandPass to their pass pipeline now do so. This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass. On all these targets, this now matches what Clang already does in the frontend. The only targets which do not configure AtomicExpandPass now are: - DirectX and SPIRV: they aren't normal backends. - AVR: a single-cpu architecture with no privileged/user divide, which could implement all atomics by disabling/enabling interrupts, regardless of size/alignment. Will be addressed by future work.	2024-01-08 22:34:28 -05:00
paperchalice	ffb1f20e0d	[CodeGen] Add flag to populate target pass names (#76328 ) `print-pipeline-passes` can show target pass names.	2024-01-03 09:07:02 +08:00
Yingchi Long	2460bf2fac	[BPF][GlobalISel] add initial gisel support for BPF (#74999 ) This adds initial codegen support for BPF backend. Only implemented ir-translator for "RET" (but not support isel). Depends on: #74998	2023-12-11 19:58:34 +08:00
Eduard Zingerman	030b8cb156	[BPF] Attribute preserve_static_offset for structs This commit adds a new BPF specific structure attribte `__attribute__((preserve_static_offset))` and a pass to deal with it. This attribute may be attached to a struct or union declaration, where it notifies the compiler that this structure is a "context" structure. The following limitations apply to context structures: - runtime environment might patch access to the fields of this type by updating the field offset; BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed. This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions. See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details. - runtime environment might disallow access to the field of the type through modified pointers. During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details. Access to the structure fields is translated to IR as a sequence: - `(load (getelementptr %ptr %offset))` or - `(store (getelementptr %ptr %offset))` During instruction selection phase such sequences are translated as a single load instruction with embedded offset, e.g. `LDW %ptr, %offset`, which matches access pattern necessary for the restricted set of types described above (when `%offset` is static). Multiple optimizer passes might separate these instructions, this includes: - SimplifyCFGPass (sinking) - InstCombine (sinking) - GVN (hoisting) The `preserve_static_offset` attribute marks structures for which the following transformations happen: - at the early IR processing stage: - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`; - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`; - at the late IR processing stage this modification is undone. Such handling prevents various optimizer passes from generating sequences of instructions that would be rejected by BPF verifier. The __attribute__((preserve_static_offset)) has a priority over __attribute__((preserve_access_index)). When preserve_access_index attribute is present preserve access index transformations are not applied. This addresses the issue reported by the following thread: https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 This is a second attempt to commit this change, previous reverted commit is: cb13e9286b6d4e384b5d4203e853d44e2eff0f0f. The following items had been fixed: - test case bpf-preserve-static-offset-bitfield.c now uses `-triple bpfel` to avoid different codegen for little/big endian targets. - BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid use after free for `WorkList` elements `V`. Differential Revision: https://reviews.llvm.org/D133361	2023-12-05 19:21:42 +02:00
Eduard Zingerman	2484469803	Revert "[BPF] Attribute preserve_static_offset for structs" This reverts commit cb13e9286b6d4e384b5d4203e853d44e2eff0f0f. Buildbot reports MSAN failures in tests added in this commit: https://lab.llvm.org/buildbot/#/builders/5/builds/38806 Failing tests: LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll	2023-11-30 22:29:45 +02:00
Eduard Zingerman	cb13e9286b	[BPF] Attribute preserve_static_offset for structs This commit adds a new BPF specific structure attribte `__attribute__((preserve_static_offset))` and a pass to deal with it. This attribute may be attached to a struct or union declaration, where it notifies the compiler that this structure is a "context" structure. The following limitations apply to context structures: - runtime environment might patch access to the fields of this type by updating the field offset; BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed. This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions. See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details. - runtime environment might disallow access to the field of the type through modified pointers. During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details. Access to the structure fields is translated to IR as a sequence: - `(load (getelementptr %ptr %offset))` or - `(store (getelementptr %ptr %offset))` During instruction selection phase such sequences are translated as a single load instruction with embedded offset, e.g. `LDW %ptr, %offset`, which matches access pattern necessary for the restricted set of types described above (when `%offset` is static). Multiple optimizer passes might separate these instructions, this includes: - SimplifyCFGPass (sinking) - InstCombine (sinking) - GVN (hoisting) The `preserve_static_offset` attribute marks structures for which the following transformations happen: - at the early IR processing stage: - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`; - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`; - at the late IR processing stage this modification is undone. Such handling prevents various optimizer passes from generating sequences of instructions that would be rejected by BPF verifier. The __attribute__((preserve_static_offset)) has a priority over __attribute__((preserve_access_index)). When preserve_access_index attribute is present preserve access index transformations are not applied. This addresses the issue reported by the following thread: https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 Differential Revision: https://reviews.llvm.org/D133361	2023-11-30 19:45:03 +02:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Eduard Zingerman	651e644595	[BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree() Replace `BPFMIPeepholeTruncElim` by adding an overload for `TargetLowering::isZExtFree()` aware that zero extension is free for `ISD::LOAD`. Short description ================= The `BPFMIPeepholeTruncElim` handles two patterns: Pattern #1: %1 = LDB %0, ... %1 = LDB %0, ... %2 = AND_ri %1, 0xff -> %2 = MOV_ri %1 <-- (!) Pattern #2: bb.1: bb.1: %a = LDB %0, ... %a = LDB %0, ... br %bb3 br %bb3 bb.2: bb.2: %b = LDB %0, ... -> %b = LDB %0, ... br %bb3 br %bb3 bb.3: bb.3: %1 = PHI %a, %b %1 = PHI %a, %b %2 = AND_ri %1, 0xff %2 = MOV_ri %1 <-- (!) Plus variations: - AND_ri_32 instead of AND_ri - SLL/SLR instead of AND_ri - LDH, LDW, LDB32, LDH32, LDW32 Both patterns could be handled by built-in transformations at instruction selection phase if suitable `isZExtFree()` implementation is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`. When evaluating on BPF kernel selftests and remove_truncate_.ll LLVM test cases this revisions performs slightly better than BPFMIPeepholeTruncElim, see "Impact" section below for details. Commit also adds a few test cases to make sure that patterns in question are handled. Long description ================ Why this works: Pattern #1 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 %cond = icmp eq i8 %a, 0 ret i1 %cond } Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command: ... Type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 t19: i64 = and t16, Constant:i64<255> t17: i64 = setcc t19, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.1 t19: i64 = and t16, Constant:i64<255> With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 and 0 other values ... Optimized type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t17: i64 = setcc t20, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Optimized type-legalized selection DAG: - `t19 = and t16, 255` had been replaced by `t16` (load). - Patterns like `(and (load ... i8), 255)` are replaced by `load` in `DAGCombiner::BackwardsPropagateMask` called from `DAGCombiner::visitAND`. - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge, look for `TLI.shouldFoldConstantShiftPairToMask()` call). Why this works: Pattern #2 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 br label %next next: %cond = icmp eq i8 %a, 0 ret i1 %cond } Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command. Log for first basic block: Initial selection DAG: %bb.0 'foo:entry' SelectionDAG has 9 nodes: t0: ch,glue = EntryToken t3: i64 = Constant<0> t2: i64,ch = CopyFromReg t0, Register:i64 %1 t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 t8: ch = CopyToReg t0, Register:i64 %0, t6 ... Replacing.1 t6: i64 = zero_extend t5 With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 and 0 other values ... Optimized lowered selection DAG: %bb.0 'foo:entry' SelectionDAG has 7 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %1 t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t8: ch = CopyToReg t0, Register:i64 %0, t9 Note: - Initial selection DAG: - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))` w/o special `isZExtFree()` overload added by this commit it is instead lowered as `t6 = (any_extend (load ...))`. - The decision to generate `zero_extend` or `any_extend` is done in `RegsForValue::getCopyToRegs` called from `SelectionDAGBuilder::CopyValueToVirtualRegister`: - if `isZExtFree()` for load returns true `zero_extend` is used; - `any_extend` is used otherwise. - Optimized lowered selection DAG: - `t6 = (any_extend (load ...))` is replaced by `t9 = load ..., zext from i8` This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from `DAGCombiner::visitZERO_EXTEND`. Log for second basic block: Initial selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t5: i8 = truncate t4 t8: i1 = setcc t5, Constant:i8<0>, seteq:ch t9: i64 = any_extend t8 t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.2 t18: i64 = and t4, Constant:i64<255> With: t4: i64 = AssertZext t2, ValueType:ch:i8 ... Type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t18: i64 = and t4, Constant:i64<255> t16: i64 = setcc t18, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Optimized type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t16: i64 = setcc t4, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Initial selection DAG: - `t0` is an input value for this basic block, it corresponds load instruction (`t9`) from the first basic block. - It is accessed within basic block via `t4` (AssertZext (CopyFromReg t0, ...)). - The `AssertZext` is generated by RegsForValue::getCopyFromRegs called from SelectionDAGBuilder::getCopyFromRegs, it is generated only when `LiveOutInfo` with known number of leading zeros is present for `t0`. - Known register bits in `LiveOutInfo` are computed by `SelectionDAG::computeKnownBits` called from `SelectionDAGISel::ComputeLiveOutVRegInfo`. - `computeKnownBits()` generates leading zeros information for `(load ..., zext from ...)` but does not* generate leading zeros information for `(load ..., anyext from ...)`. This is why `isZExtFree()` added in this commit is important. - Type-legalized selection DAG: - `t5 = truncate t4` is replaced by `t18 = and t4, 255` - Optimized type-legalized selection DAG: - `t18 = and t4, 255` is replaced by `t4`, this is done by `DAGCombiner::SimplifyDemandedBits` called from `DAGCombiner::visitAND`, which simplifies patterns like `(and (assertzext ...))` Impact ------ This change covers all remove_truncate_.ll test cases: - for -mcpu=v4 there are no changes in the generated code; - for -mcpu=v2 code generated for remove_truncate_7 and remove_truncate_8 improved slightly, for other tests it is unchanged. For remove_truncate_7: Before this revision After this revision -------------------- ------------------- r1 <<= 0x20 r1 <<= 0x20 r1 >>= 0x20 r1 >>= 0x20 if r1 == 0x0 goto +0x2 <LBB0_2> if r1 == 0x0 goto +0x2 <LBB0_2> r1 = (u32 )(r2 + 0x0) r0 = (u32 )(r2 + 0x0) goto +0x1 <LBB0_3> goto +0x1 <LBB0_3> <LBB0_2>: <LBB0_2>: r1 = (u32 )(r2 + 0x4) r0 = (u32 )(r2 + 0x4) <LBB0_3>: <LBB0_3>: r0 = r1 exit exit For remove_truncate_8: Before this revision After this revision -------------------- ------------------- r2 = (u32 )(r1 + 0x0) r2 = (u32 )(r1 + 0x0) r3 = r2 r3 = r2 r3 <<= 0x20 r3 <<= 0x20 r4 = r3 r3 s>>= 0x20 r4 s>>= 0x20 if r4 s> 0x2 goto +0x5 <LBB0_3> if r3 s> 0x2 goto +0x4 <LBB0_3> r4 = (u32 )(r1 + 0x4) r3 = (u32 )(r1 + 0x4) r3 >>= 0x20 if r3 >= r4 goto +0x2 <LBB0_3> if r2 >= r3 goto +0x2 <LBB0_3> r2 += 0x2 r2 += 0x2 (u32 )(r1 + 0x0) = r2 (u32 )(r1 + 0x0) = r2 <LBB0_3>: <LBB0_3>: r0 = 0x3 r0 = 0x3 exit exit For kernel BPF selftests statistics is as follows: (-mcpu=v4): - For -mcpu=v4: 9 out of 655 object files have differences, in all cases total number of instructions marginally decreased (-27 instructions). - For -mcpu=v2: 9 out of 655 object files have differences: - For 19 object files number of instruction decreased (-129 instruction in total): some redundant `rX &= 0xffff` and register to register assignments removed; - For 2 object files number of instructions increased +2 instructions in each file. Both -mcpu=v2 instruction increases could be reduced to the same example: define void @foo(ptr %p) { entry: %a = load i32, ptr %p, align 4 %b = sext i32 %a to i64 %c = icmp ult i64 1, %b br i1 %c, label %next, label %end next: call void inttoptr (i64 62 to ptr)(i32 %a) br label %end end: ret void } Note that this example uses value loaded to `%a` both as a sign extended (`%b`) and as zero extended (`%a` passed as parameter). Here is the difference in final assembly code: Before this revision After this revision -------------------- ------------------- r1 = (u32 )(r1 + 0) r1 = (u32 *)(r1 + 0) r1 <<= 32 r1 <<= 32 r1 s>>= 32 r1 s>>= 32 if r1 < 2 goto <LBB0_2> if r1 < 2 goto <LBB0_2> r1 <<= 32 r1 >>= 32 call 62 call 62 <LBB0_2>: <LBB0_2>: exit exit Before this commit `%a` is passed to call as a sign extended value, after this commit `%a` is passed to call as a zero extended value, both are correct as 32-bit sub-register is the same. The difference comes from `DAGCombiner` operation on the initial DAG: Initial selection DAG before this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = any_extend t5 <--------------------- (1) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch Initial selection DAG after this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 <--------------------- (2) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch The node `t9` is processed before node `t6` and `load` instruction is combined to load with sign extension: Replacing.1 t9: i64 = sign_extend t5 With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64 and 0 other values Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 With: t31: i32 = truncate t30 and 1 other values This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from `DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which is `any_extend` in (1) and `zero_extend` in (2). `tryToFoldExtOfLoad()` rewrites such uses of `t5` differently: - `any_extend` is simply removed - `zero_extend` is replaced by `and t30, 0xffffffff`, which is later converted to a pair of shifts. This pair of shifts survives till the end of translation. Differential Revision: https://reviews.llvm.org/D157870	2023-08-22 00:04:51 +03:00
Fangrui Song	2a61ceddb3	[BPF] Remove unused legacy passes after TargetMachine::adjustPassManager removal D137796 made these passes unused. `opt --bpf-ir-peephole` is specified in one test. Add a `registerPipelineParsingCallback` so that we can use change the test to use `opt --passes=bpf-ir-peephole` instead.	2023-06-24 22:44:06 -07:00
Bjorn Pettersson	2dd221fe48	Remove no longer needed includes of LegacyPassManager.h Most of the removed includes should probably have been removed already when we removed TargetMachine::adjustPassManager.	2023-02-06 13:38:57 +01:00
Nick Desaulniers	19a004b468	[llvm][SelectionDAGISel] support -{start\|stop}-{before\|after}= for remaining targets Follow up to the series: 1. https://reviews.llvm.org/D140161 2. https://reviews.llvm.org/D140349 3. https://reviews.llvm.org/D140331 4. https://reviews.llvm.org/D140323 Completes the work from the previous two for remaining targets. This creates the following named passes that can be run via `llc -{start\|stop}-{before\|after}`: - arc-isel - arm-isel - avr-isel - bpf-isel - csky-isel - hexagon-isel - lanai-isel - loongarch-isel - m68k-isel - msp430-isel - mips-isel - nvptx-isel - ppc-codegen - riscv-isel - sparc-isel - systemz-isel - ve-isel - wasm-isel - xcore-isel A nice way to write tests for SelectionDAGISel might be to use a RUN: line like: llc -mtriple=<triple> -start-before=<arch>-isel -stop-after=finalize-isel -o - Fixes: https://github.com/llvm/llvm-project/issues/59538 Reviewed By: asb, zixuan-wu Differential Revision: https://reviews.llvm.org/D140364	2022-12-21 13:25:15 -08:00
Fangrui Song	bac974278c	CodeGen/CommandFlags: Convert Optional to std::optional	2022-12-03 18:38:12 +00:00
Krzysztof Parzyszek	8c7c20f033	Convert Optional<CodeModel> to std::optional<CodeModel>	2022-12-03 12:08:47 -06:00
Bjorn Pettersson	99c47d9e31	Remove TargetMachine::adjustPassManager Since opt no longer supports to run default (O0/O1/O2/O3/Os/Oz) pipelines using the legacy PM, there are no in-tree uses of TargetMachine::adjustPassManager remaining. This patch removes the no longer used adjustPassManager functions. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D137796	2022-11-28 10:24:16 +01:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Jameson Nash	c4b1a63a1b	mark getTargetTransformInfo and getTargetIRAnalysis as const Seems like this can be const, since Passes shouldn't modify it. Reviewed By: wsmoses Differential Revision: https://reviews.llvm.org/D120518	2022-02-25 14:30:44 -05:00
Yonghong Song	009f3a89d8	BPF: remove intrindics @llvm.stacksave() and @llvm.stackrestore() Paul Chaignon reported a bpf verifier failure ([1]) due to using non-ABI register R11. For the test case, llvm11 is okay while llvm12 and later generates verifier unfriendly code. The failure is related to variable length array size. The following mimics the variable length array definition in the test case: struct t { char a[20]; }; void foo(void *); int test() { const int a = 8; char tmp[AA + sizeof(struct t) + a]; foo(tmp); ... } Paul helped bisect that the following llvm commit is responsible: 552c6c232872 ("PR44406: Follow behavior of array bound constant folding in more recent versions of GCC.") Basically, before the above commit, clang frontend did constant folding for array size "AA + sizeof(struct t) + a" to be 68, so used alloca for stack allocation. After the above commit, clang frontend didn't do constant folding for array size any more, which results in a VLA and llvm.stacksave/llvm.stackrestore is generated. BPF architecture API does not support stack pointer (sp) register. The LLVM internally used R11 to indicate sp register but it should not be in the final code. Otherwise, kernel verifier will reject it. The early patch ([2]) tried to fix the issue in clang frontend. But the upstream discussion considered frontend fix is really a hack and the backend should properly undo llvm.stacksave/llvm.stackrestore. This patch implemented a bpf IR phase to remove these intrinsics unconditionally. If eventually the alloca can be resolved with constant size, r11 will not be generated. If alloca cannot be resolved with constant size, SelectionDag will complain, the same as without this patch. [1] https://lore.kernel.org/bpf/20210809151202.GB1012999@Mem/ [2] https://reviews.llvm.org/D107882 Differential Revision: https://reviews.llvm.org/D111897	2021-10-18 09:51:19 -07:00
Reid Kleckner	89b57061f7	Move TargetRegistry.(h\|cpp) from Support to MC This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually use the target, so we might as well move this out of Support. This allows us to ensure that Support doesn't have includes from MC/*. Differential Revision: https://reviews.llvm.org/D111454	2021-10-08 14:51:48 -07:00
Tarindu Jayatilaka	7a797b2902	Take OptimizationLevel class out of Pass Builder Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D107025	2021-07-29 21:57:23 -07:00
Arthur Eubanks	34a8a437bf	[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose Printing pass manager invocations is fairly verbose and not super useful. This allows us to remove DebugLogging from pass managers and PassBuilder since all logging (aside from analysis managers) goes through instrumentation now. This has the downside of never being able to print the top level pass manager via instrumentation, but that seems like a minor downside. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D101797	2021-05-07 21:51:47 -07:00
Yonghong Song	a260ae7160	BPF: Implement TTI.IntImmCost() properly This patch implemented TTI.IntImmCost() properly. Each BPF insn has 32bit immediate space, so for any immediate which can be represented as 32bit signed int, the cost is technically free. If an int cannot be presented as a 32bit signed int, a ld_imm64 instruction is needed and a TCC_Basic is returned. This change is motivated when we observed that several bpf selftests failed with latest llvm trunk, e.g., #10/16 strobemeta.o:FAIL #10/17 strobemeta_nounroll1.o:FAIL #10/18 strobemeta_nounroll2.o:FAIL #10/19 strobemeta_subprogs.o:FAIL #96 snprintf_btf:FAIL The reason of the failure is due to that SpeculateAroundPHIsPass did aggressive transformation which alters control flow for which currently verifer cannot handle well. In llvm12, SpeculateAroundPHIsPass is not called. SpeculateAroundPHIsPass relied on TTI.getIntImmCost() and TTI.getIntImmCostInst() for profitability analysis. This patch implemented TTI.getIntImmCost() properly for BPF backend which also prevented transformation which caused the above test failures. Differential Revision: https://reviews.llvm.org/D96448	2021-02-11 08:35:25 -08:00
Kazu Hirata	8a20e2b3d3	[llvm] Use Optional::getValueOr (NFC)	2021-01-12 21:43:50 -08:00
Arthur Eubanks	92a67e131f	[BPF][NewPM] Port bpf-adjust-opt to NPM and add it to pipeline Reviewed By: yonghong-song Differential Revision: https://reviews.llvm.org/D91990	2020-11-26 10:11:26 -08:00
Arthur Eubanks	ab0ddbc38a	Reland [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback This allows targets to skip optional optimization passes at -O0. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D90777	2020-11-04 13:11:40 -08:00
Arthur Eubanks	9173b5a99d	Revert "[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback" This reverts commit 7a83aa0520d24ee5285a9c60b97b57a1db1d65e8. Causing buildbot failures.	2020-11-04 12:57:32 -08:00
Arthur Eubanks	7a83aa0520	[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback This allows targets to skip optional optimization passes at -O0. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D90777	2020-11-04 12:53:30 -08:00
Yonghong Song	ddf1864ace	BPF: add AdjustOpt IR pass to generate verifier friendly codes Add an IR phase right before main module optimization. This is to modify IR to restrict certain downward optimizations in order to generate verifier friendly code. > prevent certain instcombine optimizations, handling both in-block/cross-block instcombines. > avoid speculative code motion if the variable used in condition is also used in the later blocks. Internally, a bpf IR builtin result = __builtin_bpf_passthrough(seq_num, result) is used to enforce ordering. This builtin is only used during target independent IR optimizations and it will be removed at the beginning of target dependent IR optimizations. For example, removing the following workaround, --- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c +++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c @@ -47,7 +47,7 @@ int sysctl_tcp_mem(struct bpf_sysctl ctx) / a workaround to prevent compiler from generating * codes verifier cannot handle yet. */ - volatile int ret; + int ret; this patch is able to generate code which passed the verifier. To disable optimization, users need to use "opt" command like below: clang -target bpf -O2 -S -emit-llvm -Xclang -disable-llvm-passes test.c // disable icmp serialization opt -O2 -bpf-disable-serialize-icmp test.ll \| llvm-dis > t.ll // disable avoid-speculation opt -O2 -bpf-disable-avoid-speculation test.ll \| llvm-dis > t.ll llc t.ll Differential Revision: https://reviews.llvm.org/D85570	2020-10-07 08:49:10 -07:00
Arthur Eubanks	40251fee00	[BPF][NewPM] Make BPFTargetMachine properly adjust NPM optimizer pipeline This involves porting BPFAbstractMemberAccess and BPFPreserveDIType to NPM, then adding them BPFTargetMachine::registerPassBuilderCallbacks (the NPM equivalent of adjustPassManager()). Reviewed By: yonghong-song, asbirlea Differential Revision: https://reviews.llvm.org/D88855	2020-10-06 07:42:32 -07:00
Yonghong Song	54d9f743c8	BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible Move abstractMemberAccess and PreserveDIType passes as early as possible, right after clang code generation. Currently, compiler may transform the above code p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); a = llvm.bpf.builtin.preserve_field_info(p2, EXIST); if (a) { p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); bpf_probe_read(buf, buf_size, p2); } to p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); a = llvm.bpf.builtin.preserve_field_info(p2, EXIST); if (a) { bpf_probe_read(buf, buf_size, p2); } and eventually assembly code looks like reloc_exist = 1; reloc_member_offset = 10; //calculate member offset from base p2 = base + reloc_member_offset; if (reloc_exist) { bpf_probe_read(bpf, buf_size, p2); } if during libbpf relocation resolution, reloc_exist is actually resolved to 0 (not exist), reloc_member_offset relocation cannot be resolved and will be patched with illegal instruction. This will cause verifier failure. This patch attempts to address this issue by do chaining analysis and replace chains with special globals right after clang code gen. This will remove the cse possibility described in the above. The IR typically looks like %6 = load @llvm.sk_buff:0:50$0:0:0:2:0 %7 = bitcast %struct.sk_buff* %2 to i8* %8 = getelementptr i8, i8* %7, %6 for a particular address computation relocation. But this transformation has another consequence, code sinking may happen like below: PHI = <possibly different @preserve__access_globals> %7 = bitcast %struct.sk_buff %2 to i8* %8 = getelementptr i8, i8* %7, %6 For such cases, we will not able to generate relocations since multiple relocations are merged into one. This patch introduced a passthrough builtin to prevent such optimization. Looks like inline assembly has more impact for optimizaiton, e.g., inlining. Using passthrough has less impact on optimizations. A new IR pass is introduced at the beginning of target-dependent IR optimization, which does: - report fatal error if any reloc global in PHI nodes - remove all bpf passthrough builtin functions Changes for existing CORE tests: - for clang tests, add "-Xclang -disable-llvm-passes" flags to avoid builtin->reloc_global transformation so the test is still able to check correctness for clang generated IR. - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> \| llvm-dis" command before "llc" command since "opt" is needed to call newly-placed builtin->reloc_global transformation. Add target triple in the IR file since "opt" requires it. - Since target triple is added in IR file, if a test may produce different results for different endianness, two tests will be created, one for bpfeb and another for bpfel, e.g., some tests for relocation of lshift/rshift of bitfields. - field-reloc-bitfield-1.ll has different relocations compared to old codes. This is because for the structure in the test, new code returns struct layout alignment 4 while old code is 8. Align 8 is more precise and permits double load. With align 4, the new mechanism uses 4-byte load, so generating different relocations. - test intrinsic-transforms.ll is removed. This is used to test cse on intrinsics so we do not lose metadata. Now metadata is attached to global and not instruction, it won't get lost with cse. Differential Revision: https://reviews.llvm.org/D87153	2020-09-28 16:56:22 -07:00
Yonghong Song	87cba43402	BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization The following bpf linux kernel selftest failed with latest llvm: $ ./test_progs -n 7/10 ... The sequence of 8193 jumps is too complex. verification time 126272 usec stack depth 320 processed 114799 insns (limit 1000000) ... libbpf: failed to load object 'pyperf600_nounroll.o' test_bpf_verif_scale:FAIL:110 #7/10 pyperf600_nounroll.o:FAIL #7 bpf_verif_scale:FAIL After some investigation, I found the following llvm patch https://reviews.llvm.org/D84108 is responsible. The patch disabled hoisting common instructions in SimplifyCFG by default. Later on, the code changes and a SimplifyCFG phase with hoisting on cannot do the work any more. A test is provided to demonstrate the problem. The IR before simplifyCFG looks like: for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp ult i32 %i.0, 6 br i1 %cmp, label %for.body, label %for.cond.cleanup for.cond.cleanup: %2 = load i8, i8* %frame_ptr, align 8, !tbaa !2 %cmp2 = icmp eq i8* %2, null %conv = zext i1 %cmp2 to i32 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3 ret i32 %conv for.body: %3 = load i8, i8* %frame_ptr, align 8, !tbaa !2 %tobool.not = icmp eq i8* %3, null br i1 %tobool.not, label %for.inc, label %land.lhs.true The first two insns of `for.cond.cleanup` and `for.body`, load and icmp, can be hoisted to `for.cond` block. With Patch D84108, the optimization is delayed. But unfortunately, later on loop rotation added addition phi nodes to `for.body` and hoisting cannot be done any more. Note such a hoisting is beneficial to bpf programs as bpf verifier does path sensitive analysis and verification. The hoisting preverts reloading from stack which will assume conservative value and increase exploited insns. In this case, it caused verifier failure. To fix this problem, I added an IR pass from bpf target to performance additional simplifycfg with hoisting common inst enabled. Differential Revision: https://reviews.llvm.org/D85434	2020-08-06 13:16:00 -07:00
Yonghong Song	6b01b46538	[BPF] preserve debuginfo types for builtin __builtin__btf_type_id() The builtin function u32 btf_type_id = __builtin_btf_type_id(param, 0) can help preserve type info for the following use case: extern void foo(..., void *data, int size); int test(...) { struct t { int a; int b; int c; } d; d.a = ...; d.b = ...; d.c = ...; foo(..., &d, sizeof(d)); } The function "foo" in the above only see raw data and does not know what type of the data is. In certain cases, e.g., logging, the additional type information will help pretty print. This patch handles the builtin in BPF backend. It includes an IR pass to translate the IR intrinsic to a load of a global variable which carries the metadata, and an MI pass to remove the intermediate load of the global variable. Finally, in AsmPrinter pass, proper instruction are generated. In the above example, the second argument for __builtin_btf_type_id() is 0, which means a relocation for local adjustment, i.e., w.r.t. bpf program BTF change, will be generated. The value 1 for the second argument means a relocation for remote adjustment, e.g., against vmlinux. Differential Revision: https://reviews.llvm.org/D74572	2020-05-15 08:00:44 -07:00

1 2

86 Commits