llvm-project

Author	SHA1	Message	Date
Nikita Popov	ff9af4c43a	[CodeGen] Convert tests to opaque pointers (NFC)	2024-02-05 14:07:09 +01:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Nikita Popov	2ad9fde418	[MemDep] Use EarliestEscapeInfo (#69727 ) Use BatchAA with EarliestEscapeInfo instead of callCapturesBefore() in MemDepAnalysis. The advantage of this is that it will also take not-captured-before information into account for non-calls (see test_store_before_capture for a representative example), and that this is a cached analysis. The disadvantage is that EII is slightly less precise than full CapturedBefore analysis. In practice the impact is positive, with gvn.NumGVNLoad going from 22022 to 22808 on test-suite. The impact to compile-time is also positive, mainly in the ThinLTO configuration.	2023-10-23 09:57:26 +02:00
Alex Richardson	83c4227ab7	Auto-generate test checks for tests affected by D141060 These files had manual CHECK lines which make the diff from D141060 very difficult to review.	2023-10-04 10:51:35 -07:00
Eduard Zingerman	8f28e8069c	[BPF] support for BPF_ST instruction in codegen Generate store immediate instruction when CPUv4 is enabled. For example: $ cat test.c struct foo { unsigned char b; unsigned short h; unsigned int w; unsigned long d; }; void bar(volatile struct foo p) { p->b = 1; p->h = 2; p->w = 3; p->d = 4; } $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - \| llvm-objdump -d - ... 0000000000000000 <bar>: 0: 72 01 00 00 01 00 00 00 (u8 )(r1 + 0x0) = 0x1 1: 6a 01 02 00 02 00 00 00 (u16 )(r1 + 0x2) = 0x2 2: 62 01 04 00 03 00 00 00 (u32 )(r1 + 0x4) = 0x3 3: 7a 01 08 00 04 00 00 00 (u64 *)(r1 + 0x8) = 0x4 4: 95 00 00 00 00 00 00 00 exit Take special care to: - apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST - validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST \| BPF_MEM \| BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff. This commit was previously reverted in e66affa17e32. The reason for revert was an unrelated bug in BPF backend, triggered by test case added in this commit if LLVM is built with LLVM_ENABLE_EXPENSIVE_CHECKS. The bug was fixed in D157806. Differential Revision: https://reviews.llvm.org/D140804	2023-08-16 17:51:28 +03:00
Eduard Zingerman	08d92dedd2	[BPF] Fix in/out argument constraints for CORE_MEM instructions When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the following C code snippet: struct t { int a; } __attribute__((preserve_access_index)); void test(struct t t) { t->a = 42; } Causes an assertion: $ clang -g -O2 -c --target=bpf -mcpu=v2 t.c -o /dev/null Function Live Ins: $r1 in %0 bb.0.entry: liveins: $r1 DBG_VALUE $r1, $noreg, !"t", ... %0:gpr = COPY $r1 DBG_VALUE %0:gpr, $noreg, !"t", ... %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr %4:gpr = MOV_ri 42 CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... RET debug-location !25; t.c:7:1 Bad machine code: Explicit definition marked as use * - function: test - basic block: %bb.0 entry (0x6210000d8a90) - instruction: CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... - operand 0: killed %4:gpr This happens because `CORE_MEM` instruction is defined to have output operands: def CORE_MEM : TYPE_LD_ST<BPF_MEM.Value, BPF_W.Value, (outs GPR:$dst), (ins u64imm:$opcode, GPR:$src, u64imm:$offset), "$dst = core_mem($opcode, $src, $offset)", []>; As documented in [1]: > By convention, the LLVM code generator orders instruction operands > so that all register definitions come before the register uses, even > on architectures that are normally printed in other orders. In other words, the first argument for `CORE_MEM` is considered to be a "def", while in reality it is "use": %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr %4:gpr = MOV_ri 42 '---------------. v CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... Here is how `CORE_MEM` is constructed in `BPFMISimplifyPatchable::checkADDrr()`: BuildMI(DefInst->getParent(), DefInst, DefInst->getDebugLoc(), TII->get(COREOp)) .add(DefInst->getOperand(0)).addImm(Opcode).add(*BaseOp) .addGlobalAddress(GVal); Note that first operand is constructed as `.add(DefInst->getOperand(0))`. For `LD{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a destination register of a load, so instruction is constructed in accordance with `outs` declaration. For `ST{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a source register of a store (value to be stored), so instruction violates the `outs` declaration. This commit fixes the issue by splitting `CORE_MEM` in three instructions: `CORE_ST`, `CORE_LD64`, `CORE_LD32` with correct `outs` specifications. [1] https://llvm.org/docs/CodeGenerator.html#the-machineinstr-class Differential Revision: https://reviews.llvm.org/D157806	2023-08-15 02:34:21 +03:00
Eduard Zingerman	27026fe563	[BPF] Reset machine register kill mark in BPFMISimplifyPatchable When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the following C code snippet: struct t { unsigned long a; } __attribute__((preserve_access_index)); void foo(volatile struct t t, volatile unsigned long p) { p = t->a; p = t->a; } Causes an assertion: $ clang -g -O2 -c --target=bpf -mcpu=v2 t2.c -o /dev/null # After BPF PreEmit SimplifyPatchable # Machine code for function foo: IsSSA, TracksLiveness Function Live Ins: $r1 in %0, $r2 in %1 bb.0.entry: liveins: $r1, $r2 DBG_VALUE $r1, $noreg, !"t", !DIExpression() DBG_VALUE $r2, $noreg, !"p", !DIExpression() %1:gpr = COPY $r2 DBG_VALUE %1:gpr, $noreg, !"p", !DIExpression() %0:gpr = COPY $r1 DBG_VALUE %0:gpr, $noreg, !"t", !DIExpression() %2:gpr = LD_imm64 @"llvm.t:0:0$0:0" %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr %5:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0" STD killed %5:gpr, %1:gpr, 0 %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr %8:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0" STD killed %8:gpr, %1:gpr, 0 RET # End machine code for function foo. * Bad machine code: Using a killed virtual register * - function: foo - basic block: %bb.0 entry (0x6210000e6690) - instruction: %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr - operand 2: killed %2:gpr This happens because of the way BPFMISimplifyPatchable::processDstReg() updates second operand of the `ADD_rr` instruction. Code before `BPFMISimplifyPatchable`: .-> %2:gpr = LD_imm64 @"llvm.t:0:0$0:0" \| \|`----------------. \| %3:gpr = LDD %2:gpr, 0 \| %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %3:gpr <--- (1) \| %5:gpr = LDD killed %4:gpr, 0 ^^^^^^^^^^^^^ \| STD killed %5:gpr, %1:gpr, 0 this is updated `----------------. %6:gpr = LDD %2:gpr, 0 %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %6:gpr <--- (2) %8:gpr = LDD killed %7:gpr, 0 ^^^^^^^^^^^^^ STD killed %8:gpr, %1:gpr, 0 this is updated Instructions (1) and (2) would be updated to: ADD_rr %0:gpr(tied-def 0), killed %2:gpr The `killed` mark is inherited from machine operands `killed %3:gpr` and `killed %6:gpr` which are updated inplace by `processDstReg()`. This commit updates `processDstReg()` reset kill marks for updated machine operands to keep liveness information conservatively correct. Differential Revision: https://reviews.llvm.org/D157805	2023-08-15 02:23:38 +03:00
Eduard Zingerman	e66affa17e	Revert "[BPF] support for BPF_ST instruction in codegen" This reverts commit 92e28e397d4ccf1bff075f48e22cf1e23a7d02bf. Reverting to investigate buildbot failure reported in [1]. field-reloc-st-imm.ll: * Bad machine code: Explicit definition must be a register * - function: bar - basic block: %bb.0 entry (0x742f318) - instruction: CORE_MEM 3, 416, %0:gpr, @"llvm.foo:0:4$0:2", ... - operand 0: 3 * Bad machine code: Explicit definition must be a register * - function: bar - basic block: %bb.0 entry (0x742f318) - instruction: CORE_MEM 4, 410, %0:gpr, @"llvm.foo:0:8$0:3", ... - operand 0: 4 LLVM ERROR: Found 4 machine code errors. [1] https://lab.llvm.org/buildbot/#/builders/16/builds/52877	2023-08-11 02:23:40 +03:00
Eduard Zingerman	92e28e397d	[BPF] support for BPF_ST instruction in codegen Generate store immediate instruction when CPUv4 is enabled. For example: $ cat test.c struct foo { unsigned char b; unsigned short h; unsigned int w; unsigned long d; }; void bar(volatile struct foo p) { p->b = 1; p->h = 2; p->w = 3; p->d = 4; } $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - \| llvm-objdump -d - ... 0000000000000000 <bar>: 0: 72 01 00 00 01 00 00 00 (u8 )(r1 + 0x0) = 0x1 1: 6a 01 02 00 02 00 00 00 (u16 )(r1 + 0x2) = 0x2 2: 62 01 04 00 03 00 00 00 (u32 )(r1 + 0x4) = 0x3 3: 7a 01 08 00 04 00 00 00 (u64 *)(r1 + 0x8) = 0x4 4: 95 00 00 00 00 00 00 00 exit Take special care to: - apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST - validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST \| BPF_MEM \| BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff. Differential Revision: https://reviews.llvm.org/D140804	2023-08-11 02:07:29 +03:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Nikita Popov	c49e0840fc	[BPFAbstractMemberAccess] Use WeakTrackingVH for Base The value will be RAUWd, make sure the reference in CallInfo gets updated. It seems like this was not a problem without opaque pointers due to the bitcast in between.	2022-12-19 15:24:52 +01:00
Nikita Popov	e95a3cc5fe	[BPF] Restore failing offset-reloc-cast-struct tests (NFC) After opaque pointer conversion these tests fail with a use after free under asan, due to bugs in BPFAbstractMemberAccess. For now, restore the tests to unbreak build bots.	2022-12-19 14:31:38 +01:00
Nikita Popov	6022873372	[BPF] Convert some tests to opaque pointers (NFC)	2022-12-19 12:46:54 +01:00
Yonghong Song	6e6c1efe04	[BPF] Handle anon record for CO-RE relocations When doing experiment in kernel, for kernel data structure sockptr_t in CO-RE operation, I hit an assertion error. The sockptr_t definition and usage look like below: #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record) typedef struct { union { void kernel; void user; }; unsigned is_kernel : 1; } sockptr_t; #pragma clang attribute pop int test(sockptr_t arg) { return arg->is_kernel; } The assertion error looks like clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:878: llvm::Value {anonymous}::BPFAbstractMemberAccess::computeBaseAndAccessKey(llvm::CallInst, {anonymous}::BPFAbstractMemberAccess::CallInfo&, std::__cxx11::string&, llvm::MDNode&): Assertion `TypeName.size()' failed. In this particular, the clang frontend attach the debuginfo metadata associated with anon structure with the preserve_access_info IR intrinsic. But the first debuginfo type has to be a named type so libbpf can have a sound start to do CO-RE relocation. Besides the above approach using pragma to push attribute, the below typedef/struct definition can have preserve_access_index directly applying to the anon struct. typedef struct { union { void kernel; void user; }; unsigned is_kernel : 1; } __attribute__((preserve_access_index) sockptr_t; This patch fixed the issue by preprocessing function argument/return types and local variable types used by other CO-RE intrinsics. For any typedef struct/union { ... } typedef_name an association of <anon struct/union, typedef> is recorded to replace the IR intrinsic metadata 'anon struct/union' to 'typedef'. It is possible that two different 'typedef' types may have identical anon struct/union type. For such a case, the association will be <anon struct/union, nullptr> to indicate the invalid case. Differential Revision: https://reviews.llvm.org/D129621	2022-07-13 15:16:16 -07:00
Daniel Müller	d129ac27e8	[BPF] Introduce support for type match relocations Among others, BPF currently supports the type-exists CO-RE relocation (e.g., see D83878 & D83242). Its intention, as the name tries to convey, is to be used for checking existence of a type in a target. While that check is useful and has its place, we would also like to be able to perform stricter type queries: instead of just checking mere existence, we want to make sure that members match up in composite types, that enum variants are present, etc. We refer to this as "type match". This change proposes the addition of a new relocation variant/value that we intend to use for establishing this match relation. Differential Revision: https://reviews.llvm.org/D126838	2022-06-29 18:23:08 -07:00
Yonghong Song	dc1c43d726	[BPF] Add BTF 64bit enum value support Current BTF only supports 32-bit value. For example, enum T { VAL = 0xffffFFFF00000008 }; the generated BTF looks like .long 16 # BTF_KIND_ENUM(id = 4) .long 100663297 # 0x6000001 .long 8 .long 18 .long 8 The encoded value is 8 which equals to (uint32_t)0xffffFFFF00000008 and this is incorrect. This patch introduced BTF_KIND_ENUM64 which permits to encode 64-bit value. The format for each enumerator looks like: .long name_offset .long (uint32_t)value # lower-32 bit value .long value >> 32 # high-32 bit value We use two 32-bit values to represent a 64-bit value as current BTF type subsection has 4-byte alignment and gaps are not permitted in the subsection. This patch also added support for kflag (the bit 31 of CommonType.Info) such that kflag = 1 implies the value is signed and kflag = 0 implies the value is unsigned. The kernel UAPI enumerator definition is struct btf_enum { __u32 name_off; __s32 val; }; so kflag = 0 with unsigned value provides backward compatability. With this patch, for enum T { VAL = 0xffffFFFF00000008 }; the generated BTF looks like .long 16 # BTF_KIND_ENUM64(id = 4) .long 3187671053 # 0x13000001 .long 8 .long 18 .long 8 # 0x8 .long 4294967295 # 0xffffffff and the enumerator value and signedness are encoded correctly. Differential Revision: https://reviews.llvm.org/D124641	2022-06-06 11:35:50 -07:00
Ivan Kosarev	ad1d60c3be	[FileCheck] Catch missspelled directives. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D125604	2022-05-26 11:37:19 +01:00
Peter Klausler	497a5f0415	[BPF] Fix a bug in BPFMISimplifyPatchable pass LLVM BPF pass SimplifyPatchable is used to do necessary code conversion for CO-RE operations. When studying bpf selftest 'exhandler', I found a corner case not handled properly. The following is the C code, modified from original 'exhandler' code. int g; int test(struct t1 p) { struct t2 q = p->q; if (q) return 0; struct t3 f = q->f; if (!f) g = 5; return 0; } For code: struct t3 f = q->f; if (!f) ... The IR before BPFMISimplifyPatchable pass looks like: %5:gpr = LD_imm64 @"llvm.t2:0:8$0:1" %6:gpr = LDD killed %5:gpr, 0 %7:gpr = LDD killed %6:gpr, 0 JNE_ri killed %7:gpr, 0, %bb.3 JMP %bb.2 Note that compiler knows q = 0 based dataflow and value analysis. The correct generated code after the pass should be %5:gpr = LD_imm64 @"llvm.t2:0:8$0:1" %7:gpr = LDD killed %5:gpr, 0 JNE_ri killed %7:gpr, 0, %bb.3 JMP %bb.2 But the current implementation did further optimization for the above code and generates %5:gpr = LD_imm64 @"llvm.t2:0:8$0:1" JNE_ri killed %5:gpr, 0, %bb.3 JMP %bb.2 which is incorrect. This patch added a cache to remember those load insns not associated with CO-RE offset value and will skip these load insns during transformation. Differential Revision: https://reviews.llvm.org/D123883	2022-04-19 15:24:26 -07:00
Yonghong Song	6ee71e53e5	[BPF] handle opaque-pointer for __builtin_preserve_enum_value Opaque pointer [1] is enabled as the default with commit [2]. Andrii found that current __builtin_preserve_enum_value() can only handle non opaque pointer code pattern and will segfault with latest llvm main branch where opaque-pointer is enabled by default. This patch added the opaque pointer support. Besides llvm selftests, also verified with bpf-next bpf selftests. [1] https://llvm.org/docs/OpaquePointers.html [2] https://reviews.llvm.org/D123122 Differential Revision: https://reviews.llvm.org/D123800	2022-04-14 11:34:32 -07:00
Yonghong Song	98e2274458	[BPF] fix a CO-RE bitfield relocation error with >8 record alignment Jussi Maki reported a fatal error like below for a bitfield CO-RE relocation: fatal error: error in backend: Unsupported field expression for llvm.bpf.preserve.field.info, requiring too big alignment The failure is related to kernel struct thread_struct. The following is a simplied example. Suppose we have below structure: struct t2 { int a[8]; } __attribute__((aligned(64))) __attribute__((preserve_access_index)); struct t1 { int f1:1; int f2:2; struct t2 f3; } __attribute__((preserve_access_index)); Note that struct t2 has aligned 64, which is used sometimes in the kernel to enforce cache line alignment. The above struct will be encoded into BTF and the following is what C code looks like and the struct will appear in the file like vmlinux.h. struct t2 { int a[8]; long: 64; long: 64; long: 64; long: 64; } __attribute__((preserve_access_index)); struct t1 { int f1: 1; int f2: 2; long: 61; long: 64; long: 64; long: 64; long: 64; long: 64; long: 64; long: 64; struct t2 f3; } __attribute__((preserve_access_index)); Note that after origin_source -> BTF -> new_source transition, the new source has the same memory layout as the old one but the alignment interpretation inside the compiler could be different. The bpf program will use the later explicitly padded structure as in vmlinux.h. In the above case, the compiler internal ABI alignment for new struct t1 is 16 while it is 4 for old struct t1. I didn't do a thorough investigation why the ABI alignment is 16 and I suspect it is related to anonymous padding in the above. Current BPF bitfield CO-RE handling requires alignment <= 8 so proper bitfield operatin can be performed. Therefore, alignment 16 will cause a compiler fatal error. To fix the ABI alignment >=16, let us check whether the bitfield can be held within a 8-byte-aligned range. If this is the case, we can use alignment 8. Otherwise, a fatal error will be reported. Differential Revision: https://reviews.llvm.org/D121821	2022-03-16 12:16:46 -07:00
Nikita Popov	90ec6dff86	[OpaquePtr] Forbid mixing typed and opaque pointers Currently, opaque pointers are supported in two forms: The -force-opaque-pointers mode, where all pointers are opaque and typed pointers do not exist. And as a simple ptr type that can coexist with typed pointers. This patch removes support for the mixed mode. You either get typed pointers, or you get opaque pointers, but not both. In the (current) default mode, using ptr is forbidden. In -opaque-pointers mode, all pointers are opaque. The motivation here is that the mixed mode introduces additional issues that don't exist in fully opaque mode. D105155 is an example of a design problem. Looking at D109259, it would probably need additional work to support mixed mode (e.g. to generate GEPs for typed base but opaque result). Mixed mode will also end up inserting many casts between i8* and ptr, which would require significant additional work to consistently avoid. I don't think the mixed mode is particularly valuable, as it doesn't align with our end goal. The only thing I've found it to be moderately useful for is adding some opaque pointer tests in between typed pointer tests, but I think we can live without that. Differential Revision: https://reviews.llvm.org/D109290	2021-09-10 15:18:23 +02:00
Nikita Popov	be5af50e7d	[BPF] Use elementtype attribute for preserve.array/struct.index intrinsics Use the elementtype attribute introduced in D105407 for the llvm.preserve.array/struct.index intrinsics. It carries the element type of the GEP these intrinsics effectively encode. This patch: * Adds a verifier check that the attribute is required. * Adds it in the IRBuilder methods for these intrinsics. * Autoupgrades old bitcode without the attribute. * Updates the lowering code to use the attribute rather than the pointer element type. * Updates lots of tests to specify the attribute. * Adds -force-opaque-pointers to the intrinsic-array.ll test to demonstrate they work now. https://reviews.llvm.org/D106184	2021-07-17 11:09:18 +02:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit bda6e5bee04c75b1f1332b4fd1ac4e8ef6c3c247. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Yonghong Song	605c811d2b	BPF: fix FIELD_EXISTS relocation with array subscripts Lorenz Bauer reported an issue in bpf mailing list ([1]) where for FIELD_EXISTS relocation, if the object is an array subscript, the patched immediate is the object offset from the base address, instead of 1. Currently in BPF AbstractMemberAccess pass, the final offset from the base address is the patched offset except FIELD_EXISTS which is 1 unconditionally. In this particular case, the last data structure access is not a field (struct/union offset) so it didn't hit the place to set patched immediate to be 1. This patch fixed the issue by checking the relocation type. If the type is FIELD_EXISTS, just set to 1. Tested by modifying some bpf selftests, libbpf is okay with such types with FIELD_EXISTS relocation. [1] https://lore.kernel.org/bpf/CACAyw99n-cMEtVst7aK-3BfHb99GMEChmRLCvhrjsRpHhPrtvA@mail.gmail.com/ Differential Revision: https://reviews.llvm.org/D102036	2021-05-06 22:37:02 -07:00
Yonghong Song	4369223ea7	BPF: make __builtin_btf_type_id() return 64bit int Linux kernel recently added support for kernel modules https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org/ In such cases, a type id in the kernel needs to be presented as (btf id for modules, btf type id for this module). Change __builtin_btf_type_id() to return 64bit value so libbpf can do the above encoding. Differential Revision: https://reviews.llvm.org/D91489	2020-11-16 07:08:41 -08:00
Simon Pilgrim	2224c2f8bc	[BPF] intrinsic-array-2.ll - remove unused check prefixes Just use default CHECK	2020-11-11 18:38:21 +00:00
Yonghong Song	edd71db38b	BPF: avoid duplicated globals for CORE relocations This patch fixed two issues related with relocation globals. In LLVM, if a global, e.g. with name "g", is created and conflict with another global with the same name, LLVM will rename the global, e.g., with a new name "g.2". Since relocation global name has special meaning, we do not want llvm to change it, so internally we have logic to check whether duplication happens or not. If happens, just reuse the previous global. The first bug is related to non-btf-id relocation (BPFAbstractMemberAccess.cpp). Commit 54d9f743c8b0 ("BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible") changed ModulePass to FunctionPass, i.e., handling each function at a time. But still just one BPFAbstractMemberAccess object is created so module level de-duplication still possible. Commit 40251fee0084 ("[BPF][NewPM] Make BPFTargetMachine properly adjust NPM optimizer pipeline") made a change to create a BPFAbstractMemberAccess object per function so module level de-duplication is not possible any more without going through all module globals. This patch simply changed the map which holds reloc globals as class static, so it will be available to all BPFAbstractMemberAccess objects for different functions. The second bug is related to btf-id relocation (BPFPreserveDIType.cpp). Before Commit 54d9f743c8b0, the pass is a ModulePass, so we have a local variable, incremented for each instance, and works fine. But after Commit 54d9f743c8b0, the pass becomes a FunctionPass. Local variable won't work properly since different functions will start with the same initial value. Fix the issue by change the local count variable as static, so it will be truely unique across the whole module compilation. Differential Revision: https://reviews.llvm.org/D88942	2020-10-06 22:37:49 -07:00
Arthur Eubanks	40251fee00	[BPF][NewPM] Make BPFTargetMachine properly adjust NPM optimizer pipeline This involves porting BPFAbstractMemberAccess and BPFPreserveDIType to NPM, then adding them BPFTargetMachine::registerPassBuilderCallbacks (the NPM equivalent of adjustPassManager()). Reviewed By: yonghong-song, asbirlea Differential Revision: https://reviews.llvm.org/D88855	2020-10-06 07:42:32 -07:00
Yonghong Song	ca1ce397ac	BPF: explicitly specify bpfel triple for certain tests Commit 54d9f743c8b0 ("BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible") changed most of CORE tests with opt run followed by llc and opt requires the target triple specified in the IR. There are few tests where little endian and big endian will report different result and for little endian versions of tests, "target triple = "bpf"" will produce wrong results if the test executed in a big endian machine, e.g. PowerPC big endian machine, since target "bpf" represents host endian and will resolve to "bpfeb". The builtbot reported such failures when build-and-run on a PowerPC big endian machine. To fix the issue, using "target triple = "bpfel"" instead.	2020-09-28 20:25:25 -07:00
Yonghong Song	54d9f743c8	BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible Move abstractMemberAccess and PreserveDIType passes as early as possible, right after clang code generation. Currently, compiler may transform the above code p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); a = llvm.bpf.builtin.preserve_field_info(p2, EXIST); if (a) { p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); bpf_probe_read(buf, buf_size, p2); } to p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); a = llvm.bpf.builtin.preserve_field_info(p2, EXIST); if (a) { bpf_probe_read(buf, buf_size, p2); } and eventually assembly code looks like reloc_exist = 1; reloc_member_offset = 10; //calculate member offset from base p2 = base + reloc_member_offset; if (reloc_exist) { bpf_probe_read(bpf, buf_size, p2); } if during libbpf relocation resolution, reloc_exist is actually resolved to 0 (not exist), reloc_member_offset relocation cannot be resolved and will be patched with illegal instruction. This will cause verifier failure. This patch attempts to address this issue by do chaining analysis and replace chains with special globals right after clang code gen. This will remove the cse possibility described in the above. The IR typically looks like %6 = load @llvm.sk_buff:0:50$0:0:0:2:0 %7 = bitcast %struct.sk_buff* %2 to i8* %8 = getelementptr i8, i8* %7, %6 for a particular address computation relocation. But this transformation has another consequence, code sinking may happen like below: PHI = <possibly different @preserve__access_globals> %7 = bitcast %struct.sk_buff %2 to i8* %8 = getelementptr i8, i8* %7, %6 For such cases, we will not able to generate relocations since multiple relocations are merged into one. This patch introduced a passthrough builtin to prevent such optimization. Looks like inline assembly has more impact for optimizaiton, e.g., inlining. Using passthrough has less impact on optimizations. A new IR pass is introduced at the beginning of target-dependent IR optimization, which does: - report fatal error if any reloc global in PHI nodes - remove all bpf passthrough builtin functions Changes for existing CORE tests: - for clang tests, add "-Xclang -disable-llvm-passes" flags to avoid builtin->reloc_global transformation so the test is still able to check correctness for clang generated IR. - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> \| llvm-dis" command before "llc" command since "opt" is needed to call newly-placed builtin->reloc_global transformation. Add target triple in the IR file since "opt" requires it. - Since target triple is added in IR file, if a test may produce different results for different endianness, two tests will be created, one for bpfeb and another for bpfel, e.g., some tests for relocation of lshift/rshift of bitfields. - field-reloc-bitfield-1.ll has different relocations compared to old codes. This is because for the structure in the test, new code returns struct layout alignment 4 while old code is 8. Align 8 is more precise and permits double load. With align 4, the new mechanism uses 4-byte load, so generating different relocations. - test intrinsic-transforms.ll is removed. This is used to test cse on intrinsics so we do not lose metadata. Now metadata is attached to global and not instruction, it won't get lost with cse. Differential Revision: https://reviews.llvm.org/D87153	2020-09-28 16:56:22 -07:00
Yonghong Song	6d218b4adb	BPF: support type exist/size and enum exist/value relocations Four new CO-RE relocations are introduced: - TYPE_EXISTENCE: whether a typedef/record/enum type exists - TYPE_SIZE: the size of a typedef/record/enum type - ENUM_VALUE_EXISTENCE: whether an enum value of an enum type exists - ENUM_VALUE: the enum value of an enum type These additional relocations will make CO-RE bpf programs more adaptive for potential kernel internal data structure changes. Differential Revision: https://reviews.llvm.org/D83878	2020-08-04 12:35:39 -07:00
Elvina Yakubova	b36a3e6140	[llvm-readobj] Update tests because of changes in llvm-readobj behavior This patch updates tests using llvm-readobj and llvm-readelf, because soon reading from stdin will be achievable only via a '-' as described here: https://bugs.llvm.org/show_bug.cgi?id=46400. Patch with changes to llvm-readobj behavior is here: https://reviews.llvm.org/D83704 Differential Revision: https://reviews.llvm.org/D83912 Reviewed by: jhenderson, MaskRay, grimar	2020-07-20 10:39:04 +01:00
Yonghong Song	7f6bc84a97	[BPF] Fix a bug for __builtin_preserve_field_info() processing Andrii discovered a problem where a simple case similar to below will generate wrong relocation kind: enum { FIELD_EXISTENCE = 2, }; struct s1 { int a1; }; int test() { struct s1 *v = 0; return __builtin_preserve_field_info(v[0], FIELD_EXISTENCE); } The expected relocation kind should be FIELD_EXISTENCE, but recorded reloc kind in the final object file is FIELD_BYTE_OFFSET, which is incorrect. This exposed a bug in generating access strings from intrinsics. The current access string generation has two steps: step 1: find the base struct/union type, step 2: traverse members in the base type. The current implementation relies on at lease one member access in step 2 to get the correct relocation kind, which is true in typical cases. But if there is no member accesses, the current implementation falls to the default info kind FIELD_BYTE_OFFSET. This is incorrect, we should still record the reloc kind based on the user input. This patch fixed this issue by properly recording the reloc kind in such cases. Differential Revision: https://reviews.llvm.org/D82932	2020-06-30 23:45:37 -07:00
Yonghong Song	3cb7e7bf95	BPF: fix a CORE optimization bug For the test case in this patch like below struct t { int a; } __attribute__((preserve_access_index)); int foo(void ); int test(struct t arg) { long param[1]; param[0] = (long)&arg->a; return foo(param); } The IR right before BPF SimplifyPatchable phase: %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %2:gpr = LDD killed %1:gpr, 0 %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr STD killed %3:gpr, %stack.0.param, 0 After SimplifyPatchable phase, the incorrect IR is generated: %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr CORE_MEM killed %3:gpr, 306, %0:gpr, @"llvm.t:0:0$0:0" Note that CORE_MEM pseudo op is introduced to encode memory operations related to CORE. In the above, we intend to check whether we have a store like (%3:gpr + 0) = ... and if this is the case, we could replace it with (%0:gpr + @"llvm.t:0:0$0:0"_ = ... Unfortunately, in the above, IR for the store is *(%stack.0.param + 0) = %3:gpr and transformation should not happen. Note that we won't have problem if the actual CORE dereference (arg->a) happens. This patch fixed the problem by skip CORE optimization if the use of ADD_rr result is not the base address of the store operation. Differential Revision: https://reviews.llvm.org/D78466	2020-04-20 19:54:51 -07:00
Yonghong Song	29bc5dd194	[BPF] implement isTruncateFree and isZExtFree in BPFTargetLowering Currently, isTruncateFree() and isZExtFree() callbacks return false as they are not implemented in BPF backend. This may cause suboptimal code generation. For example, if the load in the context of zero extension has more than one use, the pattern zextload{i8,i16,i32} will not be generated. Rather, the load will be matched first and then the result is zero extended. For example, in the test together with this commit, we have I1: %0 = load i32, i32* %data_end1, align 4, !tbaa !2 I2: %conv = zext i32 %0 to i64 ... I3: %2 = load i32, i32* %data, align 4, !tbaa !7 I4: %conv2 = zext i32 %2 to i64 ... I5: %4 = trunc i64 %sub.ptr.lhs.cast to i32 I6: %conv13 = sub i32 %4, %2 ... The I1 and I2 will match to one zextloadi32 DAG node, where SUBREG_TO_REG is used to convert a 32bit register to 64bit one. During code generation, SUBREG_TO_REG is a noop. The %2 in I3 is used in both I4 and I6. If isTruncateFree() is false, the current implementation will generate a SLL_ri and SRL_ri for the zext part during lowering. This patch implement isTruncateFree() in the BPF backend, so for the above example, I3 and I4 will generate a zextloadi32 DAG node with SUBREG_TO_REG is generated during lowering to Machine IR. isZExtFree() is also implemented as it should help code gen as well. This patch also enables the change in https://reviews.llvm.org/D73985 since it won't kick in generates MOV_32_64 machine instruction. Differential Revision: https://reviews.llvm.org/D74101	2020-02-11 09:59:19 -08:00
Yonghong Song	d96c1bbaa0	[BPF] disable ReduceLoadWidth during SelectionDag phase The compiler may transform the following code ctx = ctx + reloc_offset ... ((u32 )ctx) & 0x8000 ... to ctx = ctx + reloc_offset ... ((u8 )(ctx + 1)) & 0x80 ... where reloc_offset will be replaced with a constant during AsmPrinter phase. The above transformed code will be rejected the kernel verifier as it does not allow (type )((ctx + non_zero_offset1) + non_zero_offset2) style access pattern. It is hard at SelectionDag phase to identify whether a load is related to context or not. Sometime, interprocedure analysis may be needed. So let us simply prevent such optimization from happening. Differential Revision: https://reviews.llvm.org/D73997	2020-02-04 18:37:43 -08:00
Yonghong Song	6d07802d63	[BPF] handle typedef of struct/union for CO-RE relocations Linux commit `1cf5b23988 (diff-289313b9fec99c6f0acfea19d9cfd949)` uses "#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)" to apply CO-RE relocations to all records including the following pattern: #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record) typedef struct { int a; } __t; #pragma clang attribute pop int test(__t *arg) { return arg->a; } The current approach to use struct/union type in the relocation record will result in an anonymous struct, which make later type matching difficult in bpf loader. In fact, current BPF backend will fail the above program with assertion: clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:796: ... Assertion `TypeName.size()' failed. clang will change to use the type of the base of the member access which will preserve the typedef modifier for the preserve_{struct,union}_access_index intrinsics in the above example. Here we adjust BPF backend to accept that the debuginfo type metadata may be 'typedef' and handle them properly. Differential Revision: https://reviews.llvm.org/D73902	2020-02-04 08:53:03 -08:00
Yonghong Song	fbb64aa698	[BPF] extend BTF_KIND_FUNC to cover global, static and extern funcs Previously extern function is added as BTF_KIND_VAR. This does not work well with existing BTF infrastructure as function expected to use BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO. This patch added extern function to BTF_KIND_FUNC. The two bits 0:1 of btf_type.info are used to indicate what kind of function it is: 0: static 1: global 2: extern Differential Revision: https://reviews.llvm.org/D71638	2020-01-10 09:06:31 -08:00
Yonghong Song	ffd57408ef	[BPF] Enable relocation location for load/store/shifts Previous btf field relocation is always at assignment like r1 = 4 which is converted from an ld_imm64 instruction. This patch did an optimization such that relocation instruction might be load/store/shift. Specically, the following insns may also have relocation, except BPF_MOV: LDB, LDH, LDW, LDD, STB, STH, STW, STD, LDB32, LDH32, LDW32, STB32, STH32, STW32, SLL, SRL, SRA To accomplish this, a few BPF target specific codegen only instructions are invented. They are generated at backend BPF SimplifyPatchable phase, which is at early llc phase when SSA form is available. The new codegen only instructions will be converted to real proper instructions at the codegen and BTF emission stage. Note that, as revealed by a few tests, this optimization might be actual generating more relocations: Scenario 1: if (...) { ... __builtin_preserve_field_info(arg->b2, 0) ... } else { ... __builtin_preserve_field_info(arg->b2, 0) ... } Compiler could do CSE to only have one relocation. But if both of the above is translated into codegen internal instructions, the compiler will not be able to do that. Scenario 2: offset = ... __builtin_preserve_field_info(arg->b2, 0) ... ... ... offset ... ... offset ... ... offset ... For whatever reason, the compiler might be temporarily do copy propagation of the righthand of "offset" assignment like ... __builtin_preserve_field_info(arg->b2, 0) ... ... __builtin_preserve_field_info(arg->b2, 0) ... and CSE will be able to deduplicate later. But if these intrinsics are converted to BPF pseudo instructions, they will not be able to get deduplicated. I do not expect we have big instruction count difference. It may actually reduce instruction count since now relocation is in deeper insn dependency chain. For example, for test offset-reloc-fieldinfo-2.ll, this patch generates 7 instead of 6 relocations for non-alu32 mode, but it actually reduced instruction count from 29 to 26. Differential Revision: https://reviews.llvm.org/D71790	2019-12-26 09:07:39 -08:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Yonghong Song	6db023b99b	[BPF] add "llvm." prefix to BPF internally created globals Currently, BPF backend creates some global variables with name like <type_name>:<reloc_type>:<patch_imm>$<access_str> to carry certain information to BPF backend. With direct clang compilation, the following code in llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp is triggered and the above globals are emitted to the ELF file. (clang enabled this as opt flag -faddrsig is on by default.) if (TM.Options.EmitAddrsig) { // Emit address-significance attributes for all globals. OutStreamer->EmitAddrsig(); for (const GlobalValue &GV : M.global_values()) if (!GV.use_empty() && !GV.isThreadLocal() && !GV.hasDLLImportStorageClass() && !GV.getName().startswith("llvm.") && !GV.hasAtLeastLocalUnnamedAddr()) OutStreamer->EmitAddrsigSym(getSymbol(&GV)); } ... 10162: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:0:2048$0:117 10163: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:0:2112$0:126:0 10164: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND tcp_sock:1:8$0:31:6 ... While in llc, those globals are not emited since EmitAddrsig default option is false for llc. The llc flag "-addrsig" can be used to enable the above code. This patch added "llvm." prefix to these internal globals so that they can be ignored in the above codes and possible other places. Differential Revision: https://reviews.llvm.org/D70703	2019-11-25 21:34:46 -08:00
Yonghong Song	fff2721286	[BPF] Fix CO-RE bugs with bitfields bitfield handling is not robust with current implementation. I have seen two issues as described below. Issue 1: struct s { long long f1; char f2; char b1:1; } *p; The current approach will generate an access bit size 56 (from b1 to the end of structure) which will be rejected as it is not power of 2. Issue 2: struct s { char f1; char b1:3; char b2:5; char b3:6: char b4:2; char f2; }; The LLVM will group 4 bitfields together with 2 bytes. But loading 2 bytes is not correct as it violates alignment requirement. Note that sometimes, LLVM breaks a large bitfield groups into multiple groups, but not in this case. To resolve the above two issues, this patch takes a different approach. The alignment for the structure is used to construct the offset of the bitfield access. The bitfield incurred memory access is an aligned memory access with alignment/size equal to the alignment of the structure. This also simplified the code. This may not be the optimal memory access in terms of memory access width. But this should be okay since extracting the bitfield value will have the same amount of work regardless of what kind of memory access width. Differential Revision: https://reviews.llvm.org/D69837	2019-11-04 20:08:05 -08:00
Yonghong Song	c430533771	[BPF] fix a bug in __builtin_preserve_field_info() with FIELD_BYTE_SIZE During deriving proper bitfield access FIELD_BYTE_SIZE, function Member->getStorageOffsetInBits() is used to get llvm IR type storage offset in bits so that the byte size can permit aligned loads/stores with previously derived FIELD_BYTE_OFFSET. The function should only be used with bitfield members and it will assert if ASSERT is turned on during cmake build. Constant getStorageOffsetInBits() const { assert(getTag() == dwarf::DW_TAG_member && isBitField()); if (auto C = cast_or_null<ConstantAsMetadata>(getExtraData())) return C->getValue(); return nullptr; } This patch fixed the issue by using Member->isBitField() directly and a test case is added to cover this missing case. This issue is discovered when running Andrii's linux kernel CO-RE tests. Differential Revision: https://reviews.llvm.org/D69761	2019-11-03 08:18:28 -08:00
Yonghong Song	a27c998c00	[BPF] fix a CO-RE issue with -mattr=+alu32 Ilya Leoshkevich (<iii@linux.ibm.com>) reported an issue that with -mattr=+alu32 CO-RE has a segfault in BPF MISimplifyPatchable pass. The pattern will be transformed by MISimplifyPatchable pass looks like below: r5 = ld_imm64 @"b:0:0$0:0" r2 = ldw r5, 0 ... r2 ... // use r2 The pass will remove the intermediate 'ldw' instruction and replacing all r2 with r5 likes below: r5 = ld_imm64 @"b:0:0$0:0" ... r5 ... // use r5 Later, the ld_imm64 insn will be replaced with r5 = <patched immediate> for field relocation purpose. With -mattr=+alu32, the input code may become r5 = ld_imm64 @"b:0:0$0:0" w2 = ldw32 r5, 0 ... w2 ... // use w2 Replacing "w2" with "r5" is incorrect and will trigger compiler internal errors. To fix the problem, if the register class of ldw* dest register is sub_32, we just replace the original ldw* register with: w2 = w5 Directly replacing all uses of w2 with in-place constructed w5 for the use operand seems not working in all cases. The latest kernel will have -mattr=+alu32 on by default, so added this flag to all CORE tests. Tested with latest kernel bpf-next branch as well with this patch. Differential Revision: https://reviews.llvm.org/D69438	2019-10-25 14:27:25 -07:00
Yonghong Song	d46a6a9e68	[BPF] Remove relocation for patchable externs Previously, patchable extern relocations are introduced to patch external variables used for multi versioning in compile once, run everywhere use case. The load instruction will be converted into a move with an patchable immediate which can be changed by bpf loader on the host. The kernel verifier has evolved and is able to load and propagate constant values, so compiler relocation becomes unnecessary. This patch removed codes related to this. Differential Revision: https://reviews.llvm.org/D68760 llvm-svn: 374367	2019-10-10 15:33:09 +00:00
Yonghong Song	05e46979d2	[BPF] do compile-once run-everywhere relocation for bitfields A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void , unsigned, const void ); int field_read(struct s arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void )arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = (unsigned char )((void )arg + offset); break; case 2: ull = (unsigned short )((void )arg + offset); break; case 4: ull = (unsigned int )((void )arg + offset); break; case 8: ull = (unsigned long long )((void )arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = (u64 )(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = (u64 )(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = (u16 )(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = (u64 )(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = (u8 )(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = (u32 )(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = (u32 )(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099	2019-10-08 18:23:17 +00:00
Yonghong Song	02ac75092d	[BPF] Handle offset reloc endpoint ending in the middle of chain properly During studying support for bitfield, I found an issue for an example like the one in test offset-reloc-middle-chain.ll. struct t1 { int c; }; struct s1 { struct t1 b; }; struct r1 { struct s1 a; }; #define _(x) __builtin_preserve_access_index(x) void test1(void p1, void p2, void p3); void test(struct r1 arg) { struct s1 ps = _(&arg->a); struct t1 pt = _(&arg->a.b); int *pi = _(&arg->a.b.c); test1(ps, pt, pi); } The IR looks like: %0 = llvm.preserve.struct.access(base, ...) %1 = llvm.preserve.struct.access(%0, ...) %2 = llvm.preserve.struct.access(%1, ...) using %0, %1 and %2 In this case, we need to generate three relocatiions corresponding to chains: (%0), (%0, %1) and (%0, %1, %2). After collecting all the chains, the current implementation process each chain (in a map) with code generation sequentially. For example, after (%0) is processed, the code may look like: %0 = base + special_global_variable // llvm.preserve.struct.access(base, ...) is delisted // from the instruction stream. %1 = llvm.preserve.struct.access(%0, ...) %2 = llvm.preserve.struct.access(%1, ...) using %0, %1 and %2 When processing chain (%0, %1), the current implementation tries to visit intrinsic llvm.preserve.struct.access(base, ...) to get some of its properties and this caused segfault. This patch fixed the issue by remembering all necessary information (kind, metadata, access_index, base) during analysis phase, so in code generation phase there is no need to examine the intrinsic call instructions. This also simplifies the code. Differential Revision: https://reviews.llvm.org/D68389 llvm-svn: 373621	2019-10-03 16:30:29 +00:00
Yonghong Song	c68ee0ce70	[BPF] Permit all user instructed offset relocatiions Currently, not all user specified relocations (with clang intrinsic __builtin_preserve_access_index()) will turn into relocations. In the current implementation, a __builtin_preserve_access_index() chain is turned into relocation only if the result of the clang intrinsic is used in a function call or a nonzero offset computation of getelementptr. For all other cases, the relocatiion request is ignored and the __builtin_preserve_access_index() is turned into regular getelementptr instructions. The main reason is to mimic bpf_probe_read() requirement. But there are other use cases where relocatable offset is generated but not used for bpf_probe_read(). This patch relaxed previous constraints when to generate relocations. Now, all user __builtin_preserve_access_index() will have relocations generated. Differential Revision: https://reviews.llvm.org/D67688 llvm-svn: 372198	2019-09-18 03:49:07 +00:00
Yonghong Song	44b16bd4a5	[Transforms] Do not drop !preserve.access.index metadata Currently, when a GVN or CSE optimization happens, the llvm.preserve.access.index metadata is dropped. This caused a problem for BPF AbstructMemberOffset phase as it relies on the metadata (debuginfo types). This patch added proper hooks in lib/Transforms to preserve !preserve.access.index metadata. A test case is added to ensure metadata is preserved under CSE. Differential Revision: https://reviews.llvm.org/D65700 llvm-svn: 367769	2019-08-03 23:41:26 +00:00

1 2

56 Commits