llvm-project

Author	SHA1	Message	Date
Yingchi Long	70deb7bfe9	[BPF] expand cttz, ctlz for i32, i64 (#73668 ) Fixes: https://github.com/llvm/llvm-project/issues/62252 Depends on: #73667	2024-04-01 10:57:54 +08:00
eddyz87	65b123e287	[BPF] rename 'arena' to 'address_space' (#85161 ) There are a few places where `arena` name is used for pointers in non-zero address space in BPF backend, rename these to use a more generic `address_space`: - macro `__BPF_FEATURE_ARENA_CAST` -> `__BPF_FEATURE_ADDR_SPACE_CAST - name for arena global variables section `.arena.N` -> `.addr_space.N`	2024-03-14 19:20:06 -07:00
4ast	2aacb56e83	BPF address space insn (#84410 ) This commit aims to support BPF arena kernel side [feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/): - arena is a memory region accessible from both BPF program and userspace; - base pointers for this memory region differ between kernel and user spaces; - `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)` translates src_reg, a pointer in src_addr_space to dst_reg, equivalent pointer in dst_addr_space, {src,dst}_addr_space are immediate constants; - number 0 is assigned to kernel address space; - number 1 is assigned to user address space. On the LLVM side, the goal is to make load and store operations on arena pointers "transparent" for BPF programs: - assume that pointers with non-zero address space are pointers to arena memory; - assume that arena is identified by address space number; - assume that address space zero corresponds to kernel address space; - assume that every BPF-side load or store from arena is done via pointer in user address space, thus convert base pointers using `addr_space_cast(src_reg, 0, 1)`; Only load, store, cmpxchg and atomicrmw IR instructions are handled by this transformation. For example, the following C code: ```c #define __as __attribute__((address_space(1))) void copy(int __as from, int __as to) { to = from; } ``` Compiled to the following IR: ```llvm define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) { entry: %0 = load i32, ptr addrspace(1) %from, align 4 store i32 %0, ptr addrspace(1) %to, align 4 ret void } ``` Is transformed to: ```llvm %to2 = addrspacecast ptr addrspace(1) %to to ptr ;; ! %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; ! %0 = load i32, ptr %from1, align 4, !tbaa !3 store i32 %0, ptr %to2, align 4, !tbaa !3 ret void ``` And compiled as: ```asm r2 = addr_space_cast(r2, 0, 1) r1 = addr_space_cast(r1, 0, 1) r1 = (u32 )(r1 + 0) (u32 )(r2 + 0) = r1 exit ``` Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>	2024-03-13 02:27:25 +02:00
Nikita Popov	ff9af4c43a	[CodeGen] Convert tests to opaque pointers (NFC)	2024-02-05 14:07:09 +01:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
James Y Knight	b856e77b2d	Set MaxAtomicSizeInBitsSupported for remaining targets. (#75703 ) Targets affected: - NVPTX and BPF: set to 64 bits. - ARC, Lanai, and MSP430: set to 0 (they don't implement atomics). Those which didn't yet add AtomicExpandPass to their pass pipeline now do so. This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass. On all these targets, this now matches what Clang already does in the frontend. The only targets which do not configure AtomicExpandPass now are: - DirectX and SPIRV: they aren't normal backends. - AVR: a single-cpu architecture with no privileged/user divide, which could implement all atomics by disabling/enabling interrupts, regardless of size/alignment. Will be addressed by future work.	2024-01-08 22:34:28 -05:00
Yingwei Zheng	1228becf7d	[FuncAttrs] Deduce `noundef` attributes for return values (#76553 ) This patch deduces `noundef` attributes for return values. IIUC, a function returns `noundef` values iff all of its return values are guaranteed not to be `undef` or `poison`. Definition of `noundef` from LangRef: ``` noundef This attribute applies to parameters and return values. If the value representation contains any undefined or poison bits, the behavior is undefined. Note that this does not refer to padding introduced by the type’s storage representation. ``` Alive2: https://alive2.llvm.org/ce/z/g8Eis6 Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|+0.01%\|+0.01%\|-0.01%\|+0.01%\|+0.03%\|-0.04%\|+0.01%\| The motivation of this patch is to reduce the number of `freeze` insts and enable more optimizations.	2023-12-31 20:44:48 +08:00
Yingchi Long	ddf85b92aa	[BPF] improve error handling by custom lowering & fail() (#75088 ) Currently on mcpu=v3 we do not support sdiv, srem instructions. And the backend crashes with stacktrace & coredump, which is misleading for end users, as this is not a "bug" Add llvm bug reporting for sdiv/srem on ISel legalize-op phase. For clang frontend we can get detailed location & bug report. $ build/bin/clang -g -target bpf -c local/sdiv.c local/sdiv.c:1:35: error: unsupported signed division, please convert to unsigned div/mod. 1 \| int sdiv(int a, int b) { return a / b; } \| ^ 1 error generated. Fixes: #70433 Fixes: #48647 This also improves error handling for dynamic stack allocation: local/vla.c:2:3: error: unsupported dynamic stack allocation 2 \| int b[n]; \| ^ 1 error generated. Fixes: https://github.com/llvm/llvm-project/issues/57171	2023-12-13 13:41:52 +08:00
Yingchi Long	c4ac1d239f	[BPF][GlobalISel] select non-PreISelGenericOpcode (#75034 ) This selects non-PreISelGenericOpcode as-is. Depends on: #74999 Co-authored-by: Origami404 <Origami404@foxmail.com>	2023-12-12 16:19:34 +08:00
Yingchi Long	2460bf2fac	[BPF][GlobalISel] add initial gisel support for BPF (#74999 ) This adds initial codegen support for BPF backend. Only implemented ir-translator for "RET" (but not support isel). Depends on: #74998	2023-12-11 19:58:34 +08:00
Eduard Zingerman	030b8cb156	[BPF] Attribute preserve_static_offset for structs This commit adds a new BPF specific structure attribte `__attribute__((preserve_static_offset))` and a pass to deal with it. This attribute may be attached to a struct or union declaration, where it notifies the compiler that this structure is a "context" structure. The following limitations apply to context structures: - runtime environment might patch access to the fields of this type by updating the field offset; BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed. This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions. See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details. - runtime environment might disallow access to the field of the type through modified pointers. During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details. Access to the structure fields is translated to IR as a sequence: - `(load (getelementptr %ptr %offset))` or - `(store (getelementptr %ptr %offset))` During instruction selection phase such sequences are translated as a single load instruction with embedded offset, e.g. `LDW %ptr, %offset`, which matches access pattern necessary for the restricted set of types described above (when `%offset` is static). Multiple optimizer passes might separate these instructions, this includes: - SimplifyCFGPass (sinking) - InstCombine (sinking) - GVN (hoisting) The `preserve_static_offset` attribute marks structures for which the following transformations happen: - at the early IR processing stage: - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`; - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`; - at the late IR processing stage this modification is undone. Such handling prevents various optimizer passes from generating sequences of instructions that would be rejected by BPF verifier. The __attribute__((preserve_static_offset)) has a priority over __attribute__((preserve_access_index)). When preserve_access_index attribute is present preserve access index transformations are not applied. This addresses the issue reported by the following thread: https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 This is a second attempt to commit this change, previous reverted commit is: cb13e9286b6d4e384b5d4203e853d44e2eff0f0f. The following items had been fixed: - test case bpf-preserve-static-offset-bitfield.c now uses `-triple bpfel` to avoid different codegen for little/big endian targets. - BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid use after free for `WorkList` elements `V`. Differential Revision: https://reviews.llvm.org/D133361	2023-12-05 19:21:42 +02:00
Eduard Zingerman	2484469803	Revert "[BPF] Attribute preserve_static_offset for structs" This reverts commit cb13e9286b6d4e384b5d4203e853d44e2eff0f0f. Buildbot reports MSAN failures in tests added in this commit: https://lab.llvm.org/buildbot/#/builders/5/builds/38806 Failing tests: LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll	2023-11-30 22:29:45 +02:00
Eduard Zingerman	cb13e9286b	[BPF] Attribute preserve_static_offset for structs This commit adds a new BPF specific structure attribte `__attribute__((preserve_static_offset))` and a pass to deal with it. This attribute may be attached to a struct or union declaration, where it notifies the compiler that this structure is a "context" structure. The following limitations apply to context structures: - runtime environment might patch access to the fields of this type by updating the field offset; BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed. This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions. See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details. - runtime environment might disallow access to the field of the type through modified pointers. During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details. Access to the structure fields is translated to IR as a sequence: - `(load (getelementptr %ptr %offset))` or - `(store (getelementptr %ptr %offset))` During instruction selection phase such sequences are translated as a single load instruction with embedded offset, e.g. `LDW %ptr, %offset`, which matches access pattern necessary for the restricted set of types described above (when `%offset` is static). Multiple optimizer passes might separate these instructions, this includes: - SimplifyCFGPass (sinking) - InstCombine (sinking) - GVN (hoisting) The `preserve_static_offset` attribute marks structures for which the following transformations happen: - at the early IR processing stage: - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`; - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`; - at the late IR processing stage this modification is undone. Such handling prevents various optimizer passes from generating sequences of instructions that would be rejected by BPF verifier. The __attribute__((preserve_static_offset)) has a priority over __attribute__((preserve_access_index)). When preserve_access_index attribute is present preserve access index transformations are not applied. This addresses the issue reported by the following thread: https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 Differential Revision: https://reviews.llvm.org/D133361	2023-11-30 19:45:03 +02:00
Philip Reames	a7f35d54ee	[SCEV] Extend isImpliedCondOperandsViaRanges to independent predicates (#71110 ) As far as I can tell, there's nothing in this code which actually assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred RHS) are the same. Noticed while investigating something else, this is purely an oppurtunistic optimization while I'm looking at the code. Unfortunately, this doesn't solve my original problem. :)	2023-11-07 07:25:47 -08:00
yonghong-song	32e35b21b5	[BPF] Skip modifiers for __builtin_btf_type_id() local type (#71094 ) BPF upstream reported an inconsistent behavior w.r.t. BPF_TYPE_ID_LOCAL vs. BPF_TYPE_ID_TARGET (or BPF_TYPE_ID_REMOTE in LLVM terminology). For BPF_TYPE_ID_TARGET, all modifiers (like 'const' and 'volatile') are ignored in the final type encoding. For example, for type 'const struct foo', the eventually encoding in BTF relocation is 'struct foo'. This faciliates libbpf to match corresponding kernel types with considering any modifiers. Currently behavior for BPF_TYPE_ID_LOCAL is different. It will encode 'const struct foo' in BTF relocation and such discrepancy confused users ([1]). This patch fixed this discrepancy by making BPF_TYPE_ID_LOCAL BTF type representation the sams as BPF_TYPE_ID_TARGET. This should have minimum user impact since ultimately user wants to get a real time not a 'const' type modifier. The selftest builtin-btf-type-id-2.ll is used to test BPF_TYPE_ID_TARGET with 'const' modifier. Adapt the same test for BPF_TYPE_ID_LOCAL. And the below diff shows now both BPF_TYPE_ID_LOCAL and BPF_TYPE_ID_TARGET produces the same type: $ diff test/CodeGen/BPF/BTF/builtin-btf-type-id-2.ll test/CodeGen/BPF/BTF/builtin-btf-type-id-local.ll --- test/CodeGen/BPF/BTF/builtin-btf-type-id-2.ll 2023-07-30 16:58:20.657528310 -0700 +++ test/CodeGen/BPF/BTF/builtin-btf-type-id-local.ll 2023-11-02 10:23:25.356959008 -0700 @@ -6,7 +6,7 @@ ; int a; ; }; ; int test(void) { -; return __builtin_btf_type_id((const struct s )0, 1); +; return __builtin_btf_type_id((const struct s )0, 0); ; } ; Compilation flag: ; clang -target bpf -O2 -g -S -emit-llvm -Xclang -disable-llvm-passes test.c $ [1] https://lore.kernel.org/bpf/CAN+4W8h3yDjkOLJPiuKVKTpj_08pBz8ke6vN=Lf8gcA=iYBM-g@mail.gmail.com/ Co-authored-by: Yonghong Song <yonghong.song@linux.dev>	2023-11-03 12:52:16 -07:00
Philip Reames	f6f769203d	[tests] Autogenerate a couple of tests As usual, making it easier for an upcoming test delta to be seen. Note that several of these are examples of extremely bad testing practice. Checking internal debug output (for no real purpose), and checking the result of a fully O2 + llc run instead of reducing the specific problematic pass.	2023-11-03 08:42:23 -07:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Nikita Popov	2ad9fde418	[MemDep] Use EarliestEscapeInfo (#69727 ) Use BatchAA with EarliestEscapeInfo instead of callCapturesBefore() in MemDepAnalysis. The advantage of this is that it will also take not-captured-before information into account for non-calls (see test_store_before_capture for a representative example), and that this is a cached analysis. The disadvantage is that EII is slightly less precise than full CapturedBefore analysis. In practice the impact is positive, with gvn.NumGVNLoad going from 22022 to 22808 on test-suite. The impact to compile-time is also positive, mainly in the ThinLTO configuration.	2023-10-23 09:57:26 +02:00
Nikita Popov	a72d88fb4f	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 8840da2db237cd714d975c199d5992945d2b71e9. This results in verifier failures during LTO, see #68929.	2023-10-16 12:17:24 +02:00
Nikita Popov	8840da2db2	Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size Reapply now that generation of incorrect debuginfo for FnDef in rustc has been fixed. ----- Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-10-09 14:22:12 +02:00
Alex Richardson	83c4227ab7	Auto-generate test checks for tests affected by D141060 These files had manual CHECK lines which make the diff from D141060 very difficult to review.	2023-10-04 10:51:35 -07:00
Nikita Popov	38c59b9f53	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 47324cfd7d8ca1a2a5cbb9f948ecff66a28ee6bc. This exposed incorrect debuginfo in rustc. Revert the verification until this has been fixed.	2023-09-18 17:24:53 +02:00
Nikita Popov	47324cfd7d	Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size Reapply after fixing a clang bug this exposed in D158972 and adjusting a number of tests that failed for 32-bit targets. ----- Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-09-15 14:51:50 +02:00
Fangrui Song	806761a762	[test] Change llc -march= to -mtriple= The issue is uncovered by #47698: for IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. riscv64-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly.	2023-09-11 14:42:37 -07:00
Nikita Popov	98cf20f890	Revert "[Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 183f49c3e0f4a7facf237581f83ae07e7f4544ab. The lang/cpp/trivial_abi/TestTrivialABI.py lldb test fails on buildbots.	2023-08-28 09:44:51 +02:00
Nikita Popov	183f49c3e0	[Verifier] Sanity check alloca size against DILocalVariable fragment size Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-08-28 09:16:33 +02:00
Eduard Zingerman	651e644595	[BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree() Replace `BPFMIPeepholeTruncElim` by adding an overload for `TargetLowering::isZExtFree()` aware that zero extension is free for `ISD::LOAD`. Short description ================= The `BPFMIPeepholeTruncElim` handles two patterns: Pattern #1: %1 = LDB %0, ... %1 = LDB %0, ... %2 = AND_ri %1, 0xff -> %2 = MOV_ri %1 <-- (!) Pattern #2: bb.1: bb.1: %a = LDB %0, ... %a = LDB %0, ... br %bb3 br %bb3 bb.2: bb.2: %b = LDB %0, ... -> %b = LDB %0, ... br %bb3 br %bb3 bb.3: bb.3: %1 = PHI %a, %b %1 = PHI %a, %b %2 = AND_ri %1, 0xff %2 = MOV_ri %1 <-- (!) Plus variations: - AND_ri_32 instead of AND_ri - SLL/SLR instead of AND_ri - LDH, LDW, LDB32, LDH32, LDW32 Both patterns could be handled by built-in transformations at instruction selection phase if suitable `isZExtFree()` implementation is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`. When evaluating on BPF kernel selftests and remove_truncate_.ll LLVM test cases this revisions performs slightly better than BPFMIPeepholeTruncElim, see "Impact" section below for details. Commit also adds a few test cases to make sure that patterns in question are handled. Long description ================ Why this works: Pattern #1 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 %cond = icmp eq i8 %a, 0 ret i1 %cond } Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command: ... Type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 t19: i64 = and t16, Constant:i64<255> t17: i64 = setcc t19, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.1 t19: i64 = and t16, Constant:i64<255> With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 and 0 other values ... Optimized type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t17: i64 = setcc t20, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Optimized type-legalized selection DAG: - `t19 = and t16, 255` had been replaced by `t16` (load). - Patterns like `(and (load ... i8), 255)` are replaced by `load` in `DAGCombiner::BackwardsPropagateMask` called from `DAGCombiner::visitAND`. - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge, look for `TLI.shouldFoldConstantShiftPairToMask()` call). Why this works: Pattern #2 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 br label %next next: %cond = icmp eq i8 %a, 0 ret i1 %cond } Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command. Log for first basic block: Initial selection DAG: %bb.0 'foo:entry' SelectionDAG has 9 nodes: t0: ch,glue = EntryToken t3: i64 = Constant<0> t2: i64,ch = CopyFromReg t0, Register:i64 %1 t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 t8: ch = CopyToReg t0, Register:i64 %0, t6 ... Replacing.1 t6: i64 = zero_extend t5 With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 and 0 other values ... Optimized lowered selection DAG: %bb.0 'foo:entry' SelectionDAG has 7 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %1 t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t8: ch = CopyToReg t0, Register:i64 %0, t9 Note: - Initial selection DAG: - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))` w/o special `isZExtFree()` overload added by this commit it is instead lowered as `t6 = (any_extend (load ...))`. - The decision to generate `zero_extend` or `any_extend` is done in `RegsForValue::getCopyToRegs` called from `SelectionDAGBuilder::CopyValueToVirtualRegister`: - if `isZExtFree()` for load returns true `zero_extend` is used; - `any_extend` is used otherwise. - Optimized lowered selection DAG: - `t6 = (any_extend (load ...))` is replaced by `t9 = load ..., zext from i8` This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from `DAGCombiner::visitZERO_EXTEND`. Log for second basic block: Initial selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t5: i8 = truncate t4 t8: i1 = setcc t5, Constant:i8<0>, seteq:ch t9: i64 = any_extend t8 t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.2 t18: i64 = and t4, Constant:i64<255> With: t4: i64 = AssertZext t2, ValueType:ch:i8 ... Type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t18: i64 = and t4, Constant:i64<255> t16: i64 = setcc t18, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Optimized type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t16: i64 = setcc t4, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Initial selection DAG: - `t0` is an input value for this basic block, it corresponds load instruction (`t9`) from the first basic block. - It is accessed within basic block via `t4` (AssertZext (CopyFromReg t0, ...)). - The `AssertZext` is generated by RegsForValue::getCopyFromRegs called from SelectionDAGBuilder::getCopyFromRegs, it is generated only when `LiveOutInfo` with known number of leading zeros is present for `t0`. - Known register bits in `LiveOutInfo` are computed by `SelectionDAG::computeKnownBits` called from `SelectionDAGISel::ComputeLiveOutVRegInfo`. - `computeKnownBits()` generates leading zeros information for `(load ..., zext from ...)` but does not* generate leading zeros information for `(load ..., anyext from ...)`. This is why `isZExtFree()` added in this commit is important. - Type-legalized selection DAG: - `t5 = truncate t4` is replaced by `t18 = and t4, 255` - Optimized type-legalized selection DAG: - `t18 = and t4, 255` is replaced by `t4`, this is done by `DAGCombiner::SimplifyDemandedBits` called from `DAGCombiner::visitAND`, which simplifies patterns like `(and (assertzext ...))` Impact ------ This change covers all remove_truncate_.ll test cases: - for -mcpu=v4 there are no changes in the generated code; - for -mcpu=v2 code generated for remove_truncate_7 and remove_truncate_8 improved slightly, for other tests it is unchanged. For remove_truncate_7: Before this revision After this revision -------------------- ------------------- r1 <<= 0x20 r1 <<= 0x20 r1 >>= 0x20 r1 >>= 0x20 if r1 == 0x0 goto +0x2 <LBB0_2> if r1 == 0x0 goto +0x2 <LBB0_2> r1 = (u32 )(r2 + 0x0) r0 = (u32 )(r2 + 0x0) goto +0x1 <LBB0_3> goto +0x1 <LBB0_3> <LBB0_2>: <LBB0_2>: r1 = (u32 )(r2 + 0x4) r0 = (u32 )(r2 + 0x4) <LBB0_3>: <LBB0_3>: r0 = r1 exit exit For remove_truncate_8: Before this revision After this revision -------------------- ------------------- r2 = (u32 )(r1 + 0x0) r2 = (u32 )(r1 + 0x0) r3 = r2 r3 = r2 r3 <<= 0x20 r3 <<= 0x20 r4 = r3 r3 s>>= 0x20 r4 s>>= 0x20 if r4 s> 0x2 goto +0x5 <LBB0_3> if r3 s> 0x2 goto +0x4 <LBB0_3> r4 = (u32 )(r1 + 0x4) r3 = (u32 )(r1 + 0x4) r3 >>= 0x20 if r3 >= r4 goto +0x2 <LBB0_3> if r2 >= r3 goto +0x2 <LBB0_3> r2 += 0x2 r2 += 0x2 (u32 )(r1 + 0x0) = r2 (u32 )(r1 + 0x0) = r2 <LBB0_3>: <LBB0_3>: r0 = 0x3 r0 = 0x3 exit exit For kernel BPF selftests statistics is as follows: (-mcpu=v4): - For -mcpu=v4: 9 out of 655 object files have differences, in all cases total number of instructions marginally decreased (-27 instructions). - For -mcpu=v2: 9 out of 655 object files have differences: - For 19 object files number of instruction decreased (-129 instruction in total): some redundant `rX &= 0xffff` and register to register assignments removed; - For 2 object files number of instructions increased +2 instructions in each file. Both -mcpu=v2 instruction increases could be reduced to the same example: define void @foo(ptr %p) { entry: %a = load i32, ptr %p, align 4 %b = sext i32 %a to i64 %c = icmp ult i64 1, %b br i1 %c, label %next, label %end next: call void inttoptr (i64 62 to ptr)(i32 %a) br label %end end: ret void } Note that this example uses value loaded to `%a` both as a sign extended (`%b`) and as zero extended (`%a` passed as parameter). Here is the difference in final assembly code: Before this revision After this revision -------------------- ------------------- r1 = (u32 )(r1 + 0) r1 = (u32 *)(r1 + 0) r1 <<= 32 r1 <<= 32 r1 s>>= 32 r1 s>>= 32 if r1 < 2 goto <LBB0_2> if r1 < 2 goto <LBB0_2> r1 <<= 32 r1 >>= 32 call 62 call 62 <LBB0_2>: <LBB0_2>: exit exit Before this commit `%a` is passed to call as a sign extended value, after this commit `%a` is passed to call as a zero extended value, both are correct as 32-bit sub-register is the same. The difference comes from `DAGCombiner` operation on the initial DAG: Initial selection DAG before this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = any_extend t5 <--------------------- (1) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch Initial selection DAG after this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 <--------------------- (2) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch The node `t9` is processed before node `t6` and `load` instruction is combined to load with sign extension: Replacing.1 t9: i64 = sign_extend t5 With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64 and 0 other values Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 With: t31: i32 = truncate t30 and 1 other values This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from `DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which is `any_extend` in (1) and `zero_extend` in (2). `tryToFoldExtOfLoad()` rewrites such uses of `t5` differently: - `any_extend` is simply removed - `zero_extend` is replaced by `and t30, 0xffffffff`, which is later converted to a pair of shifts. This pair of shifts survives till the end of translation. Differential Revision: https://reviews.llvm.org/D157870	2023-08-22 00:04:51 +03:00
Eduard Zingerman	8f28e8069c	[BPF] support for BPF_ST instruction in codegen Generate store immediate instruction when CPUv4 is enabled. For example: $ cat test.c struct foo { unsigned char b; unsigned short h; unsigned int w; unsigned long d; }; void bar(volatile struct foo p) { p->b = 1; p->h = 2; p->w = 3; p->d = 4; } $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - \| llvm-objdump -d - ... 0000000000000000 <bar>: 0: 72 01 00 00 01 00 00 00 (u8 )(r1 + 0x0) = 0x1 1: 6a 01 02 00 02 00 00 00 (u16 )(r1 + 0x2) = 0x2 2: 62 01 04 00 03 00 00 00 (u32 )(r1 + 0x4) = 0x3 3: 7a 01 08 00 04 00 00 00 (u64 *)(r1 + 0x8) = 0x4 4: 95 00 00 00 00 00 00 00 exit Take special care to: - apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST - validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST \| BPF_MEM \| BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff. This commit was previously reverted in e66affa17e32. The reason for revert was an unrelated bug in BPF backend, triggered by test case added in this commit if LLVM is built with LLVM_ENABLE_EXPENSIVE_CHECKS. The bug was fixed in D157806. Differential Revision: https://reviews.llvm.org/D140804	2023-08-16 17:51:28 +03:00
Eduard Zingerman	08d92dedd2	[BPF] Fix in/out argument constraints for CORE_MEM instructions When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the following C code snippet: struct t { int a; } __attribute__((preserve_access_index)); void test(struct t t) { t->a = 42; } Causes an assertion: $ clang -g -O2 -c --target=bpf -mcpu=v2 t.c -o /dev/null Function Live Ins: $r1 in %0 bb.0.entry: liveins: $r1 DBG_VALUE $r1, $noreg, !"t", ... %0:gpr = COPY $r1 DBG_VALUE %0:gpr, $noreg, !"t", ... %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr %4:gpr = MOV_ri 42 CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... RET debug-location !25; t.c:7:1 Bad machine code: Explicit definition marked as use * - function: test - basic block: %bb.0 entry (0x6210000d8a90) - instruction: CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... - operand 0: killed %4:gpr This happens because `CORE_MEM` instruction is defined to have output operands: def CORE_MEM : TYPE_LD_ST<BPF_MEM.Value, BPF_W.Value, (outs GPR:$dst), (ins u64imm:$opcode, GPR:$src, u64imm:$offset), "$dst = core_mem($opcode, $src, $offset)", []>; As documented in [1]: > By convention, the LLVM code generator orders instruction operands > so that all register definitions come before the register uses, even > on architectures that are normally printed in other orders. In other words, the first argument for `CORE_MEM` is considered to be a "def", while in reality it is "use": %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr %4:gpr = MOV_ri 42 '---------------. v CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... Here is how `CORE_MEM` is constructed in `BPFMISimplifyPatchable::checkADDrr()`: BuildMI(DefInst->getParent(), DefInst, DefInst->getDebugLoc(), TII->get(COREOp)) .add(DefInst->getOperand(0)).addImm(Opcode).add(*BaseOp) .addGlobalAddress(GVal); Note that first operand is constructed as `.add(DefInst->getOperand(0))`. For `LD{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a destination register of a load, so instruction is constructed in accordance with `outs` declaration. For `ST{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a source register of a store (value to be stored), so instruction violates the `outs` declaration. This commit fixes the issue by splitting `CORE_MEM` in three instructions: `CORE_ST`, `CORE_LD64`, `CORE_LD32` with correct `outs` specifications. [1] https://llvm.org/docs/CodeGenerator.html#the-machineinstr-class Differential Revision: https://reviews.llvm.org/D157806	2023-08-15 02:34:21 +03:00
Eduard Zingerman	27026fe563	[BPF] Reset machine register kill mark in BPFMISimplifyPatchable When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the following C code snippet: struct t { unsigned long a; } __attribute__((preserve_access_index)); void foo(volatile struct t t, volatile unsigned long p) { p = t->a; p = t->a; } Causes an assertion: $ clang -g -O2 -c --target=bpf -mcpu=v2 t2.c -o /dev/null # After BPF PreEmit SimplifyPatchable # Machine code for function foo: IsSSA, TracksLiveness Function Live Ins: $r1 in %0, $r2 in %1 bb.0.entry: liveins: $r1, $r2 DBG_VALUE $r1, $noreg, !"t", !DIExpression() DBG_VALUE $r2, $noreg, !"p", !DIExpression() %1:gpr = COPY $r2 DBG_VALUE %1:gpr, $noreg, !"p", !DIExpression() %0:gpr = COPY $r1 DBG_VALUE %0:gpr, $noreg, !"t", !DIExpression() %2:gpr = LD_imm64 @"llvm.t:0:0$0:0" %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr %5:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0" STD killed %5:gpr, %1:gpr, 0 %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr %8:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0" STD killed %8:gpr, %1:gpr, 0 RET # End machine code for function foo. * Bad machine code: Using a killed virtual register * - function: foo - basic block: %bb.0 entry (0x6210000e6690) - instruction: %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr - operand 2: killed %2:gpr This happens because of the way BPFMISimplifyPatchable::processDstReg() updates second operand of the `ADD_rr` instruction. Code before `BPFMISimplifyPatchable`: .-> %2:gpr = LD_imm64 @"llvm.t:0:0$0:0" \| \|`----------------. \| %3:gpr = LDD %2:gpr, 0 \| %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %3:gpr <--- (1) \| %5:gpr = LDD killed %4:gpr, 0 ^^^^^^^^^^^^^ \| STD killed %5:gpr, %1:gpr, 0 this is updated `----------------. %6:gpr = LDD %2:gpr, 0 %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %6:gpr <--- (2) %8:gpr = LDD killed %7:gpr, 0 ^^^^^^^^^^^^^ STD killed %8:gpr, %1:gpr, 0 this is updated Instructions (1) and (2) would be updated to: ADD_rr %0:gpr(tied-def 0), killed %2:gpr The `killed` mark is inherited from machine operands `killed %3:gpr` and `killed %6:gpr` which are updated inplace by `processDstReg()`. This commit updates `processDstReg()` reset kill marks for updated machine operands to keep liveness information conservatively correct. Differential Revision: https://reviews.llvm.org/D157805	2023-08-15 02:23:38 +03:00
Eduard Zingerman	e66affa17e	Revert "[BPF] support for BPF_ST instruction in codegen" This reverts commit 92e28e397d4ccf1bff075f48e22cf1e23a7d02bf. Reverting to investigate buildbot failure reported in [1]. field-reloc-st-imm.ll: * Bad machine code: Explicit definition must be a register * - function: bar - basic block: %bb.0 entry (0x742f318) - instruction: CORE_MEM 3, 416, %0:gpr, @"llvm.foo:0:4$0:2", ... - operand 0: 3 * Bad machine code: Explicit definition must be a register * - function: bar - basic block: %bb.0 entry (0x742f318) - instruction: CORE_MEM 4, 410, %0:gpr, @"llvm.foo:0:8$0:3", ... - operand 0: 4 LLVM ERROR: Found 4 machine code errors. [1] https://lab.llvm.org/buildbot/#/builders/16/builds/52877	2023-08-11 02:23:40 +03:00
Eduard Zingerman	92e28e397d	[BPF] support for BPF_ST instruction in codegen Generate store immediate instruction when CPUv4 is enabled. For example: $ cat test.c struct foo { unsigned char b; unsigned short h; unsigned int w; unsigned long d; }; void bar(volatile struct foo p) { p->b = 1; p->h = 2; p->w = 3; p->d = 4; } $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - \| llvm-objdump -d - ... 0000000000000000 <bar>: 0: 72 01 00 00 01 00 00 00 (u8 )(r1 + 0x0) = 0x1 1: 6a 01 02 00 02 00 00 00 (u16 )(r1 + 0x2) = 0x2 2: 62 01 04 00 03 00 00 00 (u32 )(r1 + 0x4) = 0x3 3: 7a 01 08 00 04 00 00 00 (u64 *)(r1 + 0x8) = 0x4 4: 95 00 00 00 00 00 00 00 exit Take special care to: - apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST - validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST \| BPF_MEM \| BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff. Differential Revision: https://reviews.llvm.org/D140804	2023-08-11 02:07:29 +03:00
Tamir Duberstein	055893beac	[BPF] Don't crash on missing line info When compiling Rust code we may end up with calls to functions provided by other code units. Presently this code crashes on a null pointer dereference - this patch avoids that crash and adds a test. Reviewed By: ast Differential Revision: https://reviews.llvm.org/D156446	2023-08-03 09:18:12 -04:00
Tamir Duberstein	59afd29899	[BPF] Match CHECK w/ LLVM_ENABLE_ASSERTIONS=OFF (D156136)	2023-08-01 11:12:43 +09:00
Tamir Duberstein	d542a56c1c	[BPF] Clean up SelLowering This patch contains a number of uncontroversial changes: - Replace all uses of `errs`, `assert`, `llvm_unreachable` with `report_fatal_error` with informative error strings. - Replace calls to `fail` in loops with at most one call per error instance. Previously a function with 19 arguments would log "too many args" 14 times. This was not helpful. - Change one `if (..) switch ...` to `if (..) { switch ...`. The added brace is consistent with a near-identical switch immediately above. - Elide one `SDValue` copy by using a reference rather than value. This is consistent with a variable declared immediately before it. Reviewed By: yonghong-song Differential Revision: https://reviews.llvm.org/D156136	2023-08-01 00:31:12 +03:00
Yonghong Song	6c412b6c6f	[BPF] Add a few new insns under cpu=v4 In [1], a few new insns are proposed to expand BPF ISA to . fixing the limitation of existing insn (e.g., 16bit jmp offset) . adding new insns which may improve code quality (sign_ext_ld, sign_ext_mov, st) . feature complete (sdiv, smod) . better user experience (bswap) This patch implemented insn encoding for . sign-extended load . sign-extended mov . sdiv/smod . bswap insns . unconditional jump with 32bit offset The new bswap insns are generated under cpu=v4 for __builtin_bswap. For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated which is not intuitive for the user. To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented. For conditional branch which is beyond 16-bit offset, llvm will do some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is hueristic based. I have tested bpf selftest pyperf600 with unroll account 600 which can indeed generate 32-bit jump insn, e.g., 13: 06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619> Eduard is working on to add 'st' insn to cpu=v4. A list of llc flags: disable-ldsx, disable-movsx, disable-bswap, disable-sdiv-smod, disable-gotol can be used to disable a particular insn for cpu v4. For example, user can do: llc -march=bpf -mcpu=v4 -disable-movsx t.ll to enable cpu v4 without movsx insns. References: [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/ Differential Revision: https://reviews.llvm.org/D144829	2023-07-26 08:37:30 -07:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Eduard Zingerman	18e13739b8	[BPF] Undo transformation for LICM.cpp:hoistMinMax() Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation that undoes LICM hoistMinMax pass. The undo transformation converts the following patterns: x < min(a, b) -> x < a && x < b x > min(a, b) -> x > a \|\| x > b x < max(a, b) -> x < a \|\| x < b x > max(a, b) -> x > a && x > b Where 'a' or 'b' is a constant. Also supports `sext min(...) ...` and `zext min(...) ...`. ~~~ This was previously commited as 09feee559a29 and reverted in 0bf9bfeacc8c because of the testbot memory leak report: https://lab.llvm.org/buildbot/#/builders/5/builds/34931 The memory leak issue was caused by incorrect instruction removal sequence in skinMinMaxBB(): I->dropAllReferences(); --------> I->eraseFromParent(); I->removeFromParent(); fixed to Differential Revision: https://reviews.llvm.org/D147990	2023-07-11 22:30:34 +03:00
Eduard Zingerman	0bf9bfeacc	Revert "[BPF] Undo transformation for LICM.cpp:hoistMinMax()" This reverts commit 09feee559a294611257ee157dba039fb05fe4f68. Revert because of a testbot failure: https://lab.llvm.org/buildbot/#/builders/5/builds/34931	2023-07-07 04:01:31 +03:00
Eduard Zingerman	09feee559a	[BPF] Undo transformation for LICM.cpp:hoistMinMax() Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation that undoes LICM hoistMinMax pass. The undo transformation converts the following patterns: x < min(a, b) -> x < a && x < b x > min(a, b) -> x > a \|\| x > b x < max(a, b) -> x < a \|\| x < b x > max(a, b) -> x > a && x > b Where 'a' or 'b' is a constant. Also supports `sext min(...) ...` and `zext min(...) ...`. Differential Revision: https://reviews.llvm.org/D147990	2023-07-06 16:19:59 +03:00
Eduard Zingerman	6a6db74b77	[BPF] Propagate NoMerge attribute when lowering function calls `NoMerge` attribute on machine instructions prevents certain transformations from merging these instructions. One of such transformations is 'llvm/lib/CodeGen/BranchFolding.cpp'. This attribute should be copied from IR `call` instructions to machine level instructions. See `X86TargetLowering::LowerCall` as another example. Differential Revision: https://reviews.llvm.org/D152987	2023-06-27 01:15:45 +03:00
Fangrui Song	2a61ceddb3	[BPF] Remove unused legacy passes after TargetMachine::adjustPassManager removal D137796 made these passes unused. `opt --bpf-ir-peephole` is specified in one test. Add a `registerPipelineParsingCallback` so that we can use change the test to use `opt --passes=bpf-ir-peephole` instead.	2023-06-24 22:44:06 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Eduard Zingerman	8f906bec79	[BPF] Make sure ALU32 feature is set in MCSubtargetInfo for mcpu=v3 `BPF.td` is used to generate (among other things) `MCSubtargetInfo` setup function for BPF target. Specifically, the `BPFGenSubtargetInfo.inc` file: enum { ALU32 = 0, ... }; ... extern const llvm::SubtargetSubTypeKV BPFSubTypeKV[] = { { "generic", { { { 0x0ULL, ... } } }, ... }, { "probe", { { { 0x0ULL, ... } } }, ... }, { "v1", { { { 0x0ULL, ... } } }, ... }, { "v2", { { { 0x0ULL, ... } } }, ... }, { "v3", { { { 0x1ULL, ... } } }, ... }, }; ... static inline MCSubtargetInfo createBPFMCSubtargetInfoImpl(...) { return new BPFGenMCSubtargetInfo(..., BPFSubTypeKV, ...); } The `SubtargetSubTypeKV` is defined in `MCSubtargetInfo.h` as: /// Used to provide key value pairs for feature and CPU bit flags. struct SubtargetSubTypeKV { const char Key; ///< K-V key string FeatureBitArray Implies; ///< K-V bit mask FeatureBitArray TuneImplies; ///< K-V bit mask const MCSchedModel *SchedModel; ... } The first bit array specifies features enabled by default for a specific CPU. This commit makes sure that this information is communicated to `tablegen` and correct `BPFSubTypeKV` table is generated. This allows tools like `objdump` to detect available features when `--mcpu` flag is specified. Differential Revision: https://reviews.llvm.org/D148037	2023-04-17 20:08:45 +03:00
Momchil Velikov	4ac6f99ae0	[LiveInterval] Fix live range overlap check Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D145707	2023-04-11 11:11:30 +01:00
Eduard Zingerman	d0d1431ab1	[BPF] Fix assembly parsing errors for atomic_fetch_* instructions Fixes BPF assembler parsing errors for the following instructions: - atomic_fetch_add - atomic_fetch_and - atomic_fetch_xor - atomic_fetch_or - cmpxchg32_32 - cmpxchg_64 - xchg32_32 - xchg_64 Also add a test to verify that all instructions could be assembled and disassembled. Differential Revision: https://reviews.llvm.org/D147421	2023-04-05 00:55:32 +03:00
Yonghong Song	db3d2adecb	[BPF] Improve pruning to avoid generate more types in BTF Commit 3671bdbcd214("[BPF] Fix a BTF type pruning bug") fixed a pruning bug to allow generate more types. But the commit has a bug which permits to generate more types than necessary. The following is an example to illustrate the problem. struct t1 { int a; }; struct t2 { struct t1 p1; struct t1 p2; int b; }; int foo(struct t2 arg) { return arg->b; } The following is the part of BTF generation sequence: (1). 'struct t2 arg' -> 'struct t1 p1' In this step, the type 'struct t1' will be generated as a forward decl and the ptr type (to 'struct t1') will be stored in the internal type table. (2). now the second field 'struct t1 p2' will be processed. Since the ptr type (to 'struct t1') already in the type table, the existing logic strips out ptr modifier and is able to generate BTF type for 'struct t1'. In the above step (2), if CheckPointer is true (the type traversal chain including a struct member), 'ptr' modifier should be checked and the subsequent type generation should be skipped since the same case has been processed in visitDerivedType(). The issue is exposed when I am trying to use llvm15 to compile some internal bpf programs. The bpf skeleton put the whole ELF section (after striping some sections like dwarf) as a string. The large BTF section triggered the following error: bpf_object_with_struct_ops_test_prog_bpf/BpfObjectWithStructOpsTestProg.skel.h:222:23: error: string literal of length 140144 exceeds maximum length 65536 that C++ compilers are required to support [-Werror,-Woverlength-strings] return (const void *)"\ ^~ 1 error generated. Although adding -Wno-overlength-strings could workaround the issue, improving llvm BTF generation sounds better esp. for users using vmlinux.h. Differential Revision: https://reviews.llvm.org/D145816	2023-03-13 09:34:37 -07:00
Andrew Savonichev	c65b4d64d4	[SelectionDAG] Do not second-guess alignment for alloca Alignment of an alloca in IR can be lower than the preferred alignment on purpose, but this override essentially treats the preferred alignment as the minimum alignment. The patch changes this behavior to always use the specified alignment. If alignment is not set explicitly in LLVM IR, it is set to DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign. Tests are changed as well: explicit alignment is increased to match the preferred alignment if it changes output, or omitted when it is hard to determine the right value (e.g. for pointers, some structs, or weird types). Differential Revision: https://reviews.llvm.org/D135462	2023-02-09 18:45:20 +03:00
Eduard Zingerman	f60aefdc7f	[BPF] generate btf_decl_tag records for params of extern functions After frontend changes in the following commit: "BPF: preserve btf_decl_tag for parameters of extern functions" same mechanics could be used to get the list of function parameters and associated btf_decl_tag entries for both extern and non-extern functions. This commit extracts this mechanics as a separate auxiliary function BTFDebug::processDISubprogram(). The function is called for both extern and non-extern functions in order to generated corresponding BTF_DECL_TAG records. Differential Revision: https://reviews.llvm.org/D140971	2023-01-07 09:32:18 -08:00
Eduard Zingerman	ed068386b4	[BPF] Use SectionForGlobal() for section names computation in BTF Use function TargetLoweringObjectFile::SectionForGlobal() to compute section names for globals described in BTF_KIND_DATASEC records. This fixes a discrepancy in section name computation between BTFDebug::processGlobals and the rest of the LLVM pipeline. Specifically, the following example illustrates the discrepancy before this commit: struct Foo { int i; } __attribute__((aligned(16))); struct Foo foo = { 0 }; The initializer for 'foo' looks as follows: %struct.Foo { i32 0, [12 x i8] undef } TargetLoweringObjectFile::SectionForGlobal() classifies 'foo' as a part of '.bss' section, while BTFDebug::processGlobals classified it as a part of '.data' section because of the following expression: SecName = Global.getInitializer()->isZeroValue() ? ".bss" : ".data" The isZeroValue() returns false because of the undef tail of the initializer, while SectionForGlobal() allows such patterns in '.bss'. Differential Revision: https://reviews.llvm.org/D140505	2022-12-29 11:27:19 -08:00

1 2 3 4 5 ...

324 Commits