llvm-project

Author	SHA1	Message	Date
Yingchi Long	70deb7bfe9	[BPF] expand cttz, ctlz for i32, i64 (#73668 ) Fixes: https://github.com/llvm/llvm-project/issues/62252 Depends on: #73667	2024-04-01 10:57:54 +08:00
Sergei Barannikov	5e5b656102	[MC] Make `MCParsedAsmOperand::getReg()` return `MCRegister` (#86444 )	2024-03-25 05:13:48 +03:00
paperchalice	635ea257ec	[NewPM] Fix BPF build (#86379 ) Add Passes in dependency list	2024-03-23 13:06:58 +08:00
paperchalice	2aa5bae0c0	[NewPM][BPF] Add BPFPassRegistry.def NFCI (#86241 ) Prepare migration for dag-isel.	2024-03-23 12:53:26 +08:00
Jeremy Morse	b9d83eff25	[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736 ) These are the last remaining "trivial" changes to passes that use Instruction pointers for insertion. All of this should be NFC, it's just changing the spelling of how we identify a position. In one or two locations, I'm also switching uses of getNextNode etc to using std::next with iterators. This too should be NFC. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>	2024-03-19 16:36:29 +00:00
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
yonghong-song	0e0bfacff7	[BPF] Add support for may_goto insn (#85358 ) Alexei added may_goto insn in [1]. The asm syntax for may_goto looks like may_goto <label> The instruction represents a conditional branch but the condition is implicit. Later in bpf kernel verifier, the 'may_goto <label>' insn will be rewritten with an explicit condition. The encoding of 'may_goto' insn is enforced in [2] and is also implemented in this patch. In [3], 'may_goto' insn is encoded with raw bytes. I made the following change ``` --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -328,10 +328,7 @@ l_true: \ #define cond_break \ ({ __label__ l_break, l_continue; \ - asm volatile goto("1:.byte 0xe5; \ - .byte 0; \ - .long ((%l[l_break] - 1b - 8) / 8) & 0xffff; \ - .short 0" \ + asm volatile goto("may_goto %l[l_break]" \ :::: l_break); \ goto l_continue; \ l_break: break; ``` and ran the selftest with the latest llvm with this patch. All tests are passed. [1] https://lore.kernel.org/bpf/20240306031929.42666-1-alexei.starovoitov@gmail.com/ [2] https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com/ [3] https://lore.kernel.org/bpf/20240306031929.42666-4-alexei.starovoitov@gmail.com/	2024-03-15 07:24:28 -07:00
eddyz87	65b123e287	[BPF] rename 'arena' to 'address_space' (#85161 ) There are a few places where `arena` name is used for pointers in non-zero address space in BPF backend, rename these to use a more generic `address_space`: - macro `__BPF_FEATURE_ARENA_CAST` -> `__BPF_FEATURE_ADDR_SPACE_CAST - name for arena global variables section `.arena.N` -> `.addr_space.N`	2024-03-14 19:20:06 -07:00
4ast	2aacb56e83	BPF address space insn (#84410 ) This commit aims to support BPF arena kernel side [feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/): - arena is a memory region accessible from both BPF program and userspace; - base pointers for this memory region differ between kernel and user spaces; - `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)` translates src_reg, a pointer in src_addr_space to dst_reg, equivalent pointer in dst_addr_space, {src,dst}_addr_space are immediate constants; - number 0 is assigned to kernel address space; - number 1 is assigned to user address space. On the LLVM side, the goal is to make load and store operations on arena pointers "transparent" for BPF programs: - assume that pointers with non-zero address space are pointers to arena memory; - assume that arena is identified by address space number; - assume that address space zero corresponds to kernel address space; - assume that every BPF-side load or store from arena is done via pointer in user address space, thus convert base pointers using `addr_space_cast(src_reg, 0, 1)`; Only load, store, cmpxchg and atomicrmw IR instructions are handled by this transformation. For example, the following C code: ```c #define __as __attribute__((address_space(1))) void copy(int __as from, int __as to) { to = from; } ``` Compiled to the following IR: ```llvm define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) { entry: %0 = load i32, ptr addrspace(1) %from, align 4 store i32 %0, ptr addrspace(1) %to, align 4 ret void } ``` Is transformed to: ```llvm %to2 = addrspacecast ptr addrspace(1) %to to ptr ;; ! %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; ! %0 = load i32, ptr %from1, align 4, !tbaa !3 store i32 %0, ptr %to2, align 4, !tbaa !3 ret void ``` And compiled as: ```asm r2 = addr_space_cast(r2, 0, 1) r1 = addr_space_cast(r1, 0, 1) r1 = (u32 )(r1 + 0) (u32 )(r2 + 0) = r1 exit ``` Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>	2024-03-13 02:27:25 +02:00
Yingchi Long	cf922e51b8	[BPF] lowering target address leaf nodes tconstpool (#73667 ) Adds custom lowering for tconstpool. Please ref: https://github.com/llvm/llvm-project/pull/73668 for test coverage	2024-03-06 19:47:44 +08:00
Rishabh Bali	fe42e72db2	[CodeGen] Port AtomicExpand to new Pass Manager (#71220 ) Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559	2024-02-25 18:42:22 +05:30
yonghong-song	c43ad6c0fd	BPF: Change callx insn encoding (#81546 ) Currently, the kernel verifier unsupported callx insn used the 32-bit imm field to store the target register. On the other hand, gcc used the dst_reg field to store the target register. The gcc encoding is better. This patch adjusted the coding to be the same as gcc. Signed-off-by: Yonghong Song <yonghong.song@linux.dev>	2024-02-12 20:08:01 -08:00
James Y Knight	b856e77b2d	Set MaxAtomicSizeInBitsSupported for remaining targets. (#75703 ) Targets affected: - NVPTX and BPF: set to 64 bits. - ARC, Lanai, and MSP430: set to 0 (they don't implement atomics). Those which didn't yet add AtomicExpandPass to their pass pipeline now do so. This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass. On all these targets, this now matches what Clang already does in the frontend. The only targets which do not configure AtomicExpandPass now are: - DirectX and SPIRV: they aren't normal backends. - AVR: a single-cpu architecture with no privileged/user divide, which could implement all atomics by disabling/enabling interrupts, regardless of size/alignment. Will be addressed by future work.	2024-01-08 22:34:28 -05:00
paperchalice	ffb1f20e0d	[CodeGen] Add flag to populate target pass names (#76328 ) `print-pipeline-passes` can show target pass names.	2024-01-03 09:07:02 +08:00
Alex Bradbury	80aeb62211	[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708 ) This helper function shortens examples like `cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to `Node->getConstantOperandVal(1);`. Implemented with: `git grep -l "cast<ConstantSDNode>$.->getOperand\(.$\)->getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)->getOperand\((.)$\)->getZExtValue/\1->getConstantOperandVal(\2)/` and `git grep -l "cast<ConstantSDNode>$.\.getOperand\(.$\)->getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)\.getOperand\((.)$\)->getZExtValue/\1.getConstantOperandVal(\2)/'`. With a couple of simple manual fixes needed. Result then processed by `git clang-format`.	2024-01-02 13:14:28 +00:00
Yingchi Long	ddf85b92aa	[BPF] improve error handling by custom lowering & fail() (#75088 ) Currently on mcpu=v3 we do not support sdiv, srem instructions. And the backend crashes with stacktrace & coredump, which is misleading for end users, as this is not a "bug" Add llvm bug reporting for sdiv/srem on ISel legalize-op phase. For clang frontend we can get detailed location & bug report. $ build/bin/clang -g -target bpf -c local/sdiv.c local/sdiv.c:1:35: error: unsupported signed division, please convert to unsigned div/mod. 1 \| int sdiv(int a, int b) { return a / b; } \| ^ 1 error generated. Fixes: #70433 Fixes: #48647 This also improves error handling for dynamic stack allocation: local/vla.c:2:3: error: unsupported dynamic stack allocation 2 \| int b[n]; \| ^ 1 error generated. Fixes: https://github.com/llvm/llvm-project/issues/57171	2023-12-13 13:41:52 +08:00
Yingchi Long	c4ac1d239f	[BPF][GlobalISel] select non-PreISelGenericOpcode (#75034 ) This selects non-PreISelGenericOpcode as-is. Depends on: #74999 Co-authored-by: Origami404 <Origami404@foxmail.com>	2023-12-12 16:19:34 +08:00
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00
Yingchi Long	2460bf2fac	[BPF][GlobalISel] add initial gisel support for BPF (#74999 ) This adds initial codegen support for BPF backend. Only implemented ir-translator for "RET" (but not support isel). Depends on: #74998	2023-12-11 19:58:34 +08:00
Yingchi Long	75193b192a	[BPF] use target triple for pattern predicates (#74998 ) This is used for eliminate uses of "CurDAG", which is SelectionDAG-spec, and not compatible with GIsel algorithms. (NFC)	2023-12-11 17:10:39 +08:00
yonghong-song	32d535195e	BPF: Emit an error for illegal LD_imm64 insn when LLVM_ENABLE_ASSERTI… (#74035 ) …ONS=OFF Jose reported an issue ([1]) where for the below illegal asm code ``` r0 = 1 + w3 ll ``` clang actually supports it and generates the object code. Further investigation finds that clang actually intends to reject the above code as well but only when the clang is built with LLVM_ENABLE_ASSERTIONS=ON. I later found that clang16 (built by redhat and centos) in fedora system has the same issue since they also have LLVM_ENABLE_ASSERTIONS=OFF ([2]). So let BPF backend report an error for the above case regardless of the LLVM_ENABLE_ASSERTIONS setting. [1] https://lore.kernel.org/bpf/87leahx2xh.fsf@oracle.com/#t [2] https://lore.kernel.org/bpf/840e33ec-ea4c-4b55-bda1-0be8d1e0324f@linux.dev/ Co-authored-by: Yonghong Song <yonghong.song@linux.dev>	2023-12-07 11:29:40 -08:00
Eduard Zingerman	030b8cb156	[BPF] Attribute preserve_static_offset for structs This commit adds a new BPF specific structure attribte `__attribute__((preserve_static_offset))` and a pass to deal with it. This attribute may be attached to a struct or union declaration, where it notifies the compiler that this structure is a "context" structure. The following limitations apply to context structures: - runtime environment might patch access to the fields of this type by updating the field offset; BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed. This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions. See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details. - runtime environment might disallow access to the field of the type through modified pointers. During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details. Access to the structure fields is translated to IR as a sequence: - `(load (getelementptr %ptr %offset))` or - `(store (getelementptr %ptr %offset))` During instruction selection phase such sequences are translated as a single load instruction with embedded offset, e.g. `LDW %ptr, %offset`, which matches access pattern necessary for the restricted set of types described above (when `%offset` is static). Multiple optimizer passes might separate these instructions, this includes: - SimplifyCFGPass (sinking) - InstCombine (sinking) - GVN (hoisting) The `preserve_static_offset` attribute marks structures for which the following transformations happen: - at the early IR processing stage: - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`; - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`; - at the late IR processing stage this modification is undone. Such handling prevents various optimizer passes from generating sequences of instructions that would be rejected by BPF verifier. The __attribute__((preserve_static_offset)) has a priority over __attribute__((preserve_access_index)). When preserve_access_index attribute is present preserve access index transformations are not applied. This addresses the issue reported by the following thread: https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 This is a second attempt to commit this change, previous reverted commit is: cb13e9286b6d4e384b5d4203e853d44e2eff0f0f. The following items had been fixed: - test case bpf-preserve-static-offset-bitfield.c now uses `-triple bpfel` to avoid different codegen for little/big endian targets. - BPFPreserveStaticOffset.cpp:removePAICalls() modified to avoid use after free for `WorkList` elements `V`. Differential Revision: https://reviews.llvm.org/D133361	2023-12-05 19:21:42 +02:00
Eduard Zingerman	2484469803	Revert "[BPF] Attribute preserve_static_offset for structs" This reverts commit cb13e9286b6d4e384b5d4203e853d44e2eff0f0f. Buildbot reports MSAN failures in tests added in this commit: https://lab.llvm.org/buildbot/#/builders/5/builds/38806 Failing tests: LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll	2023-11-30 22:29:45 +02:00
Eduard Zingerman	cb13e9286b	[BPF] Attribute preserve_static_offset for structs This commit adds a new BPF specific structure attribte `__attribute__((preserve_static_offset))` and a pass to deal with it. This attribute may be attached to a struct or union declaration, where it notifies the compiler that this structure is a "context" structure. The following limitations apply to context structures: - runtime environment might patch access to the fields of this type by updating the field offset; BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed. This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions. See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details. - runtime environment might disallow access to the field of the type through modified pointers. During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details. Access to the structure fields is translated to IR as a sequence: - `(load (getelementptr %ptr %offset))` or - `(store (getelementptr %ptr %offset))` During instruction selection phase such sequences are translated as a single load instruction with embedded offset, e.g. `LDW %ptr, %offset`, which matches access pattern necessary for the restricted set of types described above (when `%offset` is static). Multiple optimizer passes might separate these instructions, this includes: - SimplifyCFGPass (sinking) - InstCombine (sinking) - GVN (hoisting) The `preserve_static_offset` attribute marks structures for which the following transformations happen: - at the early IR processing stage: - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`; - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`; - at the late IR processing stage this modification is undone. Such handling prevents various optimizer passes from generating sequences of instructions that would be rejected by BPF verifier. The __attribute__((preserve_static_offset)) has a priority over __attribute__((preserve_access_index)). When preserve_access_index attribute is present preserve access index transformations are not applied. This addresses the issue reported by the following thread: https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 Differential Revision: https://reviews.llvm.org/D133361	2023-11-30 19:45:03 +02:00
yonghong-song	e247e6ff27	[BPF] Add asm support for JSET insn (#73161 ) BPF upstream reported that JSET insn is not supported in inline asm ([1]). BPF_JSET insn is part of BPF ISA so let us add asm support for it now. [1] https://lore.kernel.org/bpf/2e8a1584-a289-4b2e-800c-8b463e734bcb@linux.dev/	2023-11-27 18:43:24 -08:00
Nikita Popov	7eeedc124f	[CodeGen] Make some includes explicit (NFC) Explicitly include some headers or forward-declare types, in preparation for removing an include that pulls in many transitive headers.	2023-11-24 14:43:18 +01:00
Paulo Matos	7b9d73c2f9	[NFC] Remove Type::getInt8PtrTy (#71029 ) Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.	2023-11-07 17:26:26 +01:00
yonghong-song	32e35b21b5	[BPF] Skip modifiers for __builtin_btf_type_id() local type (#71094 ) BPF upstream reported an inconsistent behavior w.r.t. BPF_TYPE_ID_LOCAL vs. BPF_TYPE_ID_TARGET (or BPF_TYPE_ID_REMOTE in LLVM terminology). For BPF_TYPE_ID_TARGET, all modifiers (like 'const' and 'volatile') are ignored in the final type encoding. For example, for type 'const struct foo', the eventually encoding in BTF relocation is 'struct foo'. This faciliates libbpf to match corresponding kernel types with considering any modifiers. Currently behavior for BPF_TYPE_ID_LOCAL is different. It will encode 'const struct foo' in BTF relocation and such discrepancy confused users ([1]). This patch fixed this discrepancy by making BPF_TYPE_ID_LOCAL BTF type representation the sams as BPF_TYPE_ID_TARGET. This should have minimum user impact since ultimately user wants to get a real time not a 'const' type modifier. The selftest builtin-btf-type-id-2.ll is used to test BPF_TYPE_ID_TARGET with 'const' modifier. Adapt the same test for BPF_TYPE_ID_LOCAL. And the below diff shows now both BPF_TYPE_ID_LOCAL and BPF_TYPE_ID_TARGET produces the same type: $ diff test/CodeGen/BPF/BTF/builtin-btf-type-id-2.ll test/CodeGen/BPF/BTF/builtin-btf-type-id-local.ll --- test/CodeGen/BPF/BTF/builtin-btf-type-id-2.ll 2023-07-30 16:58:20.657528310 -0700 +++ test/CodeGen/BPF/BTF/builtin-btf-type-id-local.ll 2023-11-02 10:23:25.356959008 -0700 @@ -6,7 +6,7 @@ ; int a; ; }; ; int test(void) { -; return __builtin_btf_type_id((const struct s )0, 1); +; return __builtin_btf_type_id((const struct s )0, 0); ; } ; Compilation flag: ; clang -target bpf -O2 -g -S -emit-llvm -Xclang -disable-llvm-passes test.c $ [1] https://lore.kernel.org/bpf/CAN+4W8h3yDjkOLJPiuKVKTpj_08pBz8ke6vN=Lf8gcA=iYBM-g@mail.gmail.com/ Co-authored-by: Yonghong Song <yonghong.song@linux.dev>	2023-11-03 12:52:16 -07:00
Kazu Hirata	4a0ccfa865	Use llvm::endianness::{big,little,native} (NFC) Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class as opposed to an enum. This patch replaces support::{big,little,native} with llvm::endianness::{big,little,native}.	2023-10-12 21:21:45 -07:00
Kazu Hirata	a9d5056862	Use llvm::endianness (NFC) Now that llvm::support::endianness has been renamed to llvm::endianness, we can use the shorter form. This patch replaces support::endianness with llvm::endianness.	2023-10-10 21:54:15 -07:00
Eduard Zingerman	f22442553b	[BPF] Check jump and memory offsets to avoid truncation The following assembly code should issue two errors specifying that both jump and load offsets are out of range: if r1 > r2 goto +100500 r1 = (u64 )(r1 - 100500) This commit updates BPFAsmParser to check that: - offset specified for jump is either identifier (label) or a 16-bit signed constant; - offset specified for memory operations is a signed 16-bit constant. (Which matches expectations in the BPFELFObjectWriter and BPFMCCodeEmitter). Differential Revision: https://reviews.llvm.org/D158425	2023-09-23 19:39:24 +03:00
Eduard Zingerman	d15f96fe4b	[BPF][DebugInfo] Show CO-RE relocations in llvm-objdump Extend llvm-objdump to show CO-RE relocations when `-r` option is passed and object file has .BTF and .BTF.ext sections. For example, the following C program: #define __pai __attribute__((preserve_access_index)) struct foo { int i; int j;} __pai; struct bar { struct foo f[7]; } __pai; extern void sink(void ); void root(struct bar bar) { sink(&bar[2].f[3].j); } Should lead to the following objdump output: $ clang --target=bpf -O2 -g t.c -c -o - \| \ llvm-objdump --no-addresses --no-show-raw-insn -dr - ... r2 = 0x94 CO-RE <byte_off> [2] struct bar::[2].f[3].j (2:0:3:1) r1 += r2 call -0x1 R_BPF_64_32 sink exit ... More examples could be found in unit tests, see BTFParserTest.cpp. To achieve this: - Move CO-RE relocation kinds definitions from BPFCORE.h to BTF.h. - Extend BTF.h with types derived from BTF::CommonType, e.g. BTF::IntType and BTF::StrutType, to allow dyn_cast() and access to type additional data. - Extend BTFParser to load BTF type and relocation data. - Modify llvm-objdump.cpp to create instance of BTFParser when disassembly of object file with BTF sections is processed and `-r` flag is supplied. Additional information about CO-RE is available at [1]. [1] https://docs.kernel.org/bpf/llvm_reloc.html Depends on D149058 Differential Revision: https://reviews.llvm.org/D150079	2023-09-21 21:59:10 +03:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Nick Desaulniers	86735a4353	reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66264 ) reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5. Fix up build failures in targets I missed in #66003 Kept as 3 commits for reviewers to see better what's changed. Will squash when merging. - reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) - fix all the targets I missed in #66003 - fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll	2023-09-13 13:31:24 -07:00
Sergei Barannikov	a479be0f39	[MC] Change tryParseRegister to return ParseStatus (NFC) This finishes the work of replacing OperandMatchResultTy with ParseStatus, started in D154101. As a drive-by change, rename some RegNo variables to just Reg (a leftover from the days when RegNo had 'unsigned' type).	2023-09-06 10:28:12 +03:00
Eduard Zingerman	651e644595	[BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree() Replace `BPFMIPeepholeTruncElim` by adding an overload for `TargetLowering::isZExtFree()` aware that zero extension is free for `ISD::LOAD`. Short description ================= The `BPFMIPeepholeTruncElim` handles two patterns: Pattern #1: %1 = LDB %0, ... %1 = LDB %0, ... %2 = AND_ri %1, 0xff -> %2 = MOV_ri %1 <-- (!) Pattern #2: bb.1: bb.1: %a = LDB %0, ... %a = LDB %0, ... br %bb3 br %bb3 bb.2: bb.2: %b = LDB %0, ... -> %b = LDB %0, ... br %bb3 br %bb3 bb.3: bb.3: %1 = PHI %a, %b %1 = PHI %a, %b %2 = AND_ri %1, 0xff %2 = MOV_ri %1 <-- (!) Plus variations: - AND_ri_32 instead of AND_ri - SLL/SLR instead of AND_ri - LDH, LDW, LDB32, LDH32, LDW32 Both patterns could be handled by built-in transformations at instruction selection phase if suitable `isZExtFree()` implementation is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`. When evaluating on BPF kernel selftests and remove_truncate_.ll LLVM test cases this revisions performs slightly better than BPFMIPeepholeTruncElim, see "Impact" section below for details. Commit also adds a few test cases to make sure that patterns in question are handled. Long description ================ Why this works: Pattern #1 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 %cond = icmp eq i8 %a, 0 ret i1 %cond } Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command: ... Type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 t19: i64 = and t16, Constant:i64<255> t17: i64 = setcc t19, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.1 t19: i64 = and t16, Constant:i64<255> With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 and 0 other values ... Optimized type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t17: i64 = setcc t20, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Optimized type-legalized selection DAG: - `t19 = and t16, 255` had been replaced by `t16` (load). - Patterns like `(and (load ... i8), 255)` are replaced by `load` in `DAGCombiner::BackwardsPropagateMask` called from `DAGCombiner::visitAND`. - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge, look for `TLI.shouldFoldConstantShiftPairToMask()` call). Why this works: Pattern #2 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 br label %next next: %cond = icmp eq i8 %a, 0 ret i1 %cond } Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command. Log for first basic block: Initial selection DAG: %bb.0 'foo:entry' SelectionDAG has 9 nodes: t0: ch,glue = EntryToken t3: i64 = Constant<0> t2: i64,ch = CopyFromReg t0, Register:i64 %1 t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 t8: ch = CopyToReg t0, Register:i64 %0, t6 ... Replacing.1 t6: i64 = zero_extend t5 With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 and 0 other values ... Optimized lowered selection DAG: %bb.0 'foo:entry' SelectionDAG has 7 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %1 t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t8: ch = CopyToReg t0, Register:i64 %0, t9 Note: - Initial selection DAG: - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))` w/o special `isZExtFree()` overload added by this commit it is instead lowered as `t6 = (any_extend (load ...))`. - The decision to generate `zero_extend` or `any_extend` is done in `RegsForValue::getCopyToRegs` called from `SelectionDAGBuilder::CopyValueToVirtualRegister`: - if `isZExtFree()` for load returns true `zero_extend` is used; - `any_extend` is used otherwise. - Optimized lowered selection DAG: - `t6 = (any_extend (load ...))` is replaced by `t9 = load ..., zext from i8` This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from `DAGCombiner::visitZERO_EXTEND`. Log for second basic block: Initial selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t5: i8 = truncate t4 t8: i1 = setcc t5, Constant:i8<0>, seteq:ch t9: i64 = any_extend t8 t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.2 t18: i64 = and t4, Constant:i64<255> With: t4: i64 = AssertZext t2, ValueType:ch:i8 ... Type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t18: i64 = and t4, Constant:i64<255> t16: i64 = setcc t18, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Optimized type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t16: i64 = setcc t4, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Initial selection DAG: - `t0` is an input value for this basic block, it corresponds load instruction (`t9`) from the first basic block. - It is accessed within basic block via `t4` (AssertZext (CopyFromReg t0, ...)). - The `AssertZext` is generated by RegsForValue::getCopyFromRegs called from SelectionDAGBuilder::getCopyFromRegs, it is generated only when `LiveOutInfo` with known number of leading zeros is present for `t0`. - Known register bits in `LiveOutInfo` are computed by `SelectionDAG::computeKnownBits` called from `SelectionDAGISel::ComputeLiveOutVRegInfo`. - `computeKnownBits()` generates leading zeros information for `(load ..., zext from ...)` but does not* generate leading zeros information for `(load ..., anyext from ...)`. This is why `isZExtFree()` added in this commit is important. - Type-legalized selection DAG: - `t5 = truncate t4` is replaced by `t18 = and t4, 255` - Optimized type-legalized selection DAG: - `t18 = and t4, 255` is replaced by `t4`, this is done by `DAGCombiner::SimplifyDemandedBits` called from `DAGCombiner::visitAND`, which simplifies patterns like `(and (assertzext ...))` Impact ------ This change covers all remove_truncate_.ll test cases: - for -mcpu=v4 there are no changes in the generated code; - for -mcpu=v2 code generated for remove_truncate_7 and remove_truncate_8 improved slightly, for other tests it is unchanged. For remove_truncate_7: Before this revision After this revision -------------------- ------------------- r1 <<= 0x20 r1 <<= 0x20 r1 >>= 0x20 r1 >>= 0x20 if r1 == 0x0 goto +0x2 <LBB0_2> if r1 == 0x0 goto +0x2 <LBB0_2> r1 = (u32 )(r2 + 0x0) r0 = (u32 )(r2 + 0x0) goto +0x1 <LBB0_3> goto +0x1 <LBB0_3> <LBB0_2>: <LBB0_2>: r1 = (u32 )(r2 + 0x4) r0 = (u32 )(r2 + 0x4) <LBB0_3>: <LBB0_3>: r0 = r1 exit exit For remove_truncate_8: Before this revision After this revision -------------------- ------------------- r2 = (u32 )(r1 + 0x0) r2 = (u32 )(r1 + 0x0) r3 = r2 r3 = r2 r3 <<= 0x20 r3 <<= 0x20 r4 = r3 r3 s>>= 0x20 r4 s>>= 0x20 if r4 s> 0x2 goto +0x5 <LBB0_3> if r3 s> 0x2 goto +0x4 <LBB0_3> r4 = (u32 )(r1 + 0x4) r3 = (u32 )(r1 + 0x4) r3 >>= 0x20 if r3 >= r4 goto +0x2 <LBB0_3> if r2 >= r3 goto +0x2 <LBB0_3> r2 += 0x2 r2 += 0x2 (u32 )(r1 + 0x0) = r2 (u32 )(r1 + 0x0) = r2 <LBB0_3>: <LBB0_3>: r0 = 0x3 r0 = 0x3 exit exit For kernel BPF selftests statistics is as follows: (-mcpu=v4): - For -mcpu=v4: 9 out of 655 object files have differences, in all cases total number of instructions marginally decreased (-27 instructions). - For -mcpu=v2: 9 out of 655 object files have differences: - For 19 object files number of instruction decreased (-129 instruction in total): some redundant `rX &= 0xffff` and register to register assignments removed; - For 2 object files number of instructions increased +2 instructions in each file. Both -mcpu=v2 instruction increases could be reduced to the same example: define void @foo(ptr %p) { entry: %a = load i32, ptr %p, align 4 %b = sext i32 %a to i64 %c = icmp ult i64 1, %b br i1 %c, label %next, label %end next: call void inttoptr (i64 62 to ptr)(i32 %a) br label %end end: ret void } Note that this example uses value loaded to `%a` both as a sign extended (`%b`) and as zero extended (`%a` passed as parameter). Here is the difference in final assembly code: Before this revision After this revision -------------------- ------------------- r1 = (u32 )(r1 + 0) r1 = (u32 *)(r1 + 0) r1 <<= 32 r1 <<= 32 r1 s>>= 32 r1 s>>= 32 if r1 < 2 goto <LBB0_2> if r1 < 2 goto <LBB0_2> r1 <<= 32 r1 >>= 32 call 62 call 62 <LBB0_2>: <LBB0_2>: exit exit Before this commit `%a` is passed to call as a sign extended value, after this commit `%a` is passed to call as a zero extended value, both are correct as 32-bit sub-register is the same. The difference comes from `DAGCombiner` operation on the initial DAG: Initial selection DAG before this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = any_extend t5 <--------------------- (1) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch Initial selection DAG after this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 <--------------------- (2) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch The node `t9` is processed before node `t6` and `load` instruction is combined to load with sign extension: Replacing.1 t9: i64 = sign_extend t5 With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64 and 0 other values Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 With: t31: i32 = truncate t30 and 1 other values This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from `DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which is `any_extend` in (1) and `zero_extend` in (2). `tryToFoldExtOfLoad()` rewrites such uses of `t5` differently: - `any_extend` is simply removed - `zero_extend` is replaced by `and t30, 0xffffffff`, which is later converted to a pair of shifts. This pair of shifts survives till the end of translation. Differential Revision: https://reviews.llvm.org/D157870	2023-08-22 00:04:51 +03:00
Sergei Barannikov	0e79111e4d	[AVR][BPF][Lanai][Xtensa] Replace OperandMatchResultTy with ParseStatus (NFC) ParseStatus is slightly more convenient to use due to implicit conversion from bool, which allows to do something like: ``` return Error(L, "msg"); ``` when with MatchOperandResultTy it had to be: ``` Error(L, "msg"); return MatchOperand_ParseFail; ``` It also has more appropriate name since parse* methods are not only for parsing operands. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158275	2023-08-20 14:20:28 +03:00
Eduard Zingerman	8f28e8069c	[BPF] support for BPF_ST instruction in codegen Generate store immediate instruction when CPUv4 is enabled. For example: $ cat test.c struct foo { unsigned char b; unsigned short h; unsigned int w; unsigned long d; }; void bar(volatile struct foo p) { p->b = 1; p->h = 2; p->w = 3; p->d = 4; } $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - \| llvm-objdump -d - ... 0000000000000000 <bar>: 0: 72 01 00 00 01 00 00 00 (u8 )(r1 + 0x0) = 0x1 1: 6a 01 02 00 02 00 00 00 (u16 )(r1 + 0x2) = 0x2 2: 62 01 04 00 03 00 00 00 (u32 )(r1 + 0x4) = 0x3 3: 7a 01 08 00 04 00 00 00 (u64 *)(r1 + 0x8) = 0x4 4: 95 00 00 00 00 00 00 00 exit Take special care to: - apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST - validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST \| BPF_MEM \| BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff. This commit was previously reverted in e66affa17e32. The reason for revert was an unrelated bug in BPF backend, triggered by test case added in this commit if LLVM is built with LLVM_ENABLE_EXPENSIVE_CHECKS. The bug was fixed in D157806. Differential Revision: https://reviews.llvm.org/D140804	2023-08-16 17:51:28 +03:00
Eduard Zingerman	08d92dedd2	[BPF] Fix in/out argument constraints for CORE_MEM instructions When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the following C code snippet: struct t { int a; } __attribute__((preserve_access_index)); void test(struct t t) { t->a = 42; } Causes an assertion: $ clang -g -O2 -c --target=bpf -mcpu=v2 t.c -o /dev/null Function Live Ins: $r1 in %0 bb.0.entry: liveins: $r1 DBG_VALUE $r1, $noreg, !"t", ... %0:gpr = COPY $r1 DBG_VALUE %0:gpr, $noreg, !"t", ... %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr %4:gpr = MOV_ri 42 CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... RET debug-location !25; t.c:7:1 Bad machine code: Explicit definition marked as use * - function: test - basic block: %bb.0 entry (0x6210000d8a90) - instruction: CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... - operand 0: killed %4:gpr This happens because `CORE_MEM` instruction is defined to have output operands: def CORE_MEM : TYPE_LD_ST<BPF_MEM.Value, BPF_W.Value, (outs GPR:$dst), (ins u64imm:$opcode, GPR:$src, u64imm:$offset), "$dst = core_mem($opcode, $src, $offset)", []>; As documented in [1]: > By convention, the LLVM code generator orders instruction operands > so that all register definitions come before the register uses, even > on architectures that are normally printed in other orders. In other words, the first argument for `CORE_MEM` is considered to be a "def", while in reality it is "use": %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr %4:gpr = MOV_ri 42 '---------------. v CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... Here is how `CORE_MEM` is constructed in `BPFMISimplifyPatchable::checkADDrr()`: BuildMI(DefInst->getParent(), DefInst, DefInst->getDebugLoc(), TII->get(COREOp)) .add(DefInst->getOperand(0)).addImm(Opcode).add(*BaseOp) .addGlobalAddress(GVal); Note that first operand is constructed as `.add(DefInst->getOperand(0))`. For `LD{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a destination register of a load, so instruction is constructed in accordance with `outs` declaration. For `ST{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a source register of a store (value to be stored), so instruction violates the `outs` declaration. This commit fixes the issue by splitting `CORE_MEM` in three instructions: `CORE_ST`, `CORE_LD64`, `CORE_LD32` with correct `outs` specifications. [1] https://llvm.org/docs/CodeGenerator.html#the-machineinstr-class Differential Revision: https://reviews.llvm.org/D157806	2023-08-15 02:34:21 +03:00
Eduard Zingerman	27026fe563	[BPF] Reset machine register kill mark in BPFMISimplifyPatchable When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the following C code snippet: struct t { unsigned long a; } __attribute__((preserve_access_index)); void foo(volatile struct t t, volatile unsigned long p) { p = t->a; p = t->a; } Causes an assertion: $ clang -g -O2 -c --target=bpf -mcpu=v2 t2.c -o /dev/null # After BPF PreEmit SimplifyPatchable # Machine code for function foo: IsSSA, TracksLiveness Function Live Ins: $r1 in %0, $r2 in %1 bb.0.entry: liveins: $r1, $r2 DBG_VALUE $r1, $noreg, !"t", !DIExpression() DBG_VALUE $r2, $noreg, !"p", !DIExpression() %1:gpr = COPY $r2 DBG_VALUE %1:gpr, $noreg, !"p", !DIExpression() %0:gpr = COPY $r1 DBG_VALUE %0:gpr, $noreg, !"t", !DIExpression() %2:gpr = LD_imm64 @"llvm.t:0:0$0:0" %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr %5:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0" STD killed %5:gpr, %1:gpr, 0 %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr %8:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0" STD killed %8:gpr, %1:gpr, 0 RET # End machine code for function foo. * Bad machine code: Using a killed virtual register * - function: foo - basic block: %bb.0 entry (0x6210000e6690) - instruction: %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr - operand 2: killed %2:gpr This happens because of the way BPFMISimplifyPatchable::processDstReg() updates second operand of the `ADD_rr` instruction. Code before `BPFMISimplifyPatchable`: .-> %2:gpr = LD_imm64 @"llvm.t:0:0$0:0" \| \|`----------------. \| %3:gpr = LDD %2:gpr, 0 \| %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %3:gpr <--- (1) \| %5:gpr = LDD killed %4:gpr, 0 ^^^^^^^^^^^^^ \| STD killed %5:gpr, %1:gpr, 0 this is updated `----------------. %6:gpr = LDD %2:gpr, 0 %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %6:gpr <--- (2) %8:gpr = LDD killed %7:gpr, 0 ^^^^^^^^^^^^^ STD killed %8:gpr, %1:gpr, 0 this is updated Instructions (1) and (2) would be updated to: ADD_rr %0:gpr(tied-def 0), killed %2:gpr The `killed` mark is inherited from machine operands `killed %3:gpr` and `killed %6:gpr` which are updated inplace by `processDstReg()`. This commit updates `processDstReg()` reset kill marks for updated machine operands to keep liveness information conservatively correct. Differential Revision: https://reviews.llvm.org/D157805	2023-08-15 02:23:38 +03:00
Eduard Zingerman	e66affa17e	Revert "[BPF] support for BPF_ST instruction in codegen" This reverts commit 92e28e397d4ccf1bff075f48e22cf1e23a7d02bf. Reverting to investigate buildbot failure reported in [1]. field-reloc-st-imm.ll: * Bad machine code: Explicit definition must be a register * - function: bar - basic block: %bb.0 entry (0x742f318) - instruction: CORE_MEM 3, 416, %0:gpr, @"llvm.foo:0:4$0:2", ... - operand 0: 3 * Bad machine code: Explicit definition must be a register * - function: bar - basic block: %bb.0 entry (0x742f318) - instruction: CORE_MEM 4, 410, %0:gpr, @"llvm.foo:0:8$0:3", ... - operand 0: 4 LLVM ERROR: Found 4 machine code errors. [1] https://lab.llvm.org/buildbot/#/builders/16/builds/52877	2023-08-11 02:23:40 +03:00
Eduard Zingerman	92e28e397d	[BPF] support for BPF_ST instruction in codegen Generate store immediate instruction when CPUv4 is enabled. For example: $ cat test.c struct foo { unsigned char b; unsigned short h; unsigned int w; unsigned long d; }; void bar(volatile struct foo p) { p->b = 1; p->h = 2; p->w = 3; p->d = 4; } $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - \| llvm-objdump -d - ... 0000000000000000 <bar>: 0: 72 01 00 00 01 00 00 00 (u8 )(r1 + 0x0) = 0x1 1: 6a 01 02 00 02 00 00 00 (u16 )(r1 + 0x2) = 0x2 2: 62 01 04 00 03 00 00 00 (u32 )(r1 + 0x4) = 0x3 3: 7a 01 08 00 04 00 00 00 (u64 *)(r1 + 0x8) = 0x4 4: 95 00 00 00 00 00 00 00 exit Take special care to: - apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST - validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST \| BPF_MEM \| BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff. Differential Revision: https://reviews.llvm.org/D140804	2023-08-11 02:07:29 +03:00
Tamir Duberstein	055893beac	[BPF] Don't crash on missing line info When compiling Rust code we may end up with calls to functions provided by other code units. Presently this code crashes on a null pointer dereference - this patch avoids that crash and adds a test. Reviewed By: ast Differential Revision: https://reviews.llvm.org/D156446	2023-08-03 09:18:12 -04:00
Tamir Duberstein	6b5e486428	Revert "[BPF] Narrow some interfaces" Review requested by @ast and @yonghong-song. This reverts commit 82bc1839bc53db296be36947ec0c482c8ca3a3d8.	2023-08-02 14:50:01 -04:00
Tamir Duberstein	82bc1839bc	[BPF] Narrow some interfaces When compiling Rust code we sometimes see incomplete debug info leading to crashes. Narrow the interfaces so we can see where it happens. Reviewed By: ajwerner Differential Revision: https://reviews.llvm.org/D156443	2023-08-02 14:08:30 -04:00
Tamir Duberstein	f2bd78415f	[BPF] Avoid repeating MI->getOperand(NumDefs) x3 Differential Revision: https://reviews.llvm.org/D156445	2023-08-02 10:56:01 -04:00
Tamir Duberstein	d542a56c1c	[BPF] Clean up SelLowering This patch contains a number of uncontroversial changes: - Replace all uses of `errs`, `assert`, `llvm_unreachable` with `report_fatal_error` with informative error strings. - Replace calls to `fail` in loops with at most one call per error instance. Previously a function with 19 arguments would log "too many args" 14 times. This was not helpful. - Change one `if (..) switch ...` to `if (..) { switch ...`. The added brace is consistent with a near-identical switch immediately above. - Elide one `SDValue` copy by using a reference rather than value. This is consistent with a variable declared immediately before it. Reviewed By: yonghong-song Differential Revision: https://reviews.llvm.org/D156136	2023-08-01 00:31:12 +03:00
Yonghong Song	6c412b6c6f	[BPF] Add a few new insns under cpu=v4 In [1], a few new insns are proposed to expand BPF ISA to . fixing the limitation of existing insn (e.g., 16bit jmp offset) . adding new insns which may improve code quality (sign_ext_ld, sign_ext_mov, st) . feature complete (sdiv, smod) . better user experience (bswap) This patch implemented insn encoding for . sign-extended load . sign-extended mov . sdiv/smod . bswap insns . unconditional jump with 32bit offset The new bswap insns are generated under cpu=v4 for __builtin_bswap. For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated which is not intuitive for the user. To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented. For conditional branch which is beyond 16-bit offset, llvm will do some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is hueristic based. I have tested bpf selftest pyperf600 with unroll account 600 which can indeed generate 32-bit jump insn, e.g., 13: 06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619> Eduard is working on to add 'st' insn to cpu=v4. A list of llc flags: disable-ldsx, disable-movsx, disable-bswap, disable-sdiv-smod, disable-gotol can be used to disable a particular insn for cpu v4. For example, user can do: llc -march=bpf -mcpu=v4 -disable-movsx t.ll to enable cpu v4 without movsx insns. References: [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/ Differential Revision: https://reviews.llvm.org/D144829	2023-07-26 08:37:30 -07:00
Eduard Zingerman	18e13739b8	[BPF] Undo transformation for LICM.cpp:hoistMinMax() Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation that undoes LICM hoistMinMax pass. The undo transformation converts the following patterns: x < min(a, b) -> x < a && x < b x > min(a, b) -> x > a \|\| x > b x < max(a, b) -> x < a \|\| x < b x > max(a, b) -> x > a && x > b Where 'a' or 'b' is a constant. Also supports `sext min(...) ...` and `zext min(...) ...`. ~~~ This was previously commited as 09feee559a29 and reverted in 0bf9bfeacc8c because of the testbot memory leak report: https://lab.llvm.org/buildbot/#/builders/5/builds/34931 The memory leak issue was caused by incorrect instruction removal sequence in skinMinMaxBB(): I->dropAllReferences(); --------> I->eraseFromParent(); I->removeFromParent(); fixed to Differential Revision: https://reviews.llvm.org/D147990	2023-07-11 22:30:34 +03:00
Eduard Zingerman	0e7ff05fb3	[BPF][DebugInfo][NFC] Move BTF.h definitions from BPF target to DebugInfo There are plans to add some BTF processing to tools like objdump and readelf. This commit moves BTF.{h,def} files from BPF target specific location to include/llvm/DebugInfo/* to avoid tools including headers from lib/Target/*. Reviewed By: yonghong-song, MaskRay Differential Revision: https://reviews.llvm.org/D149501	2023-07-10 14:50:21 -07:00

1 2 3 4 5 ...

587 Commits