llvm-project

Author	SHA1	Message	Date
Fangrui Song	abfff89b74	[MC] Chain together fragments only if Subsections.size() > 1 and delete an unneeded setParent call.	2024-06-27 10:35:45 -07:00
Fangrui Song	21fac2d1d0	[MC] Ensure all new sections have a MCDataFragment MCAssembler::layout ensures that every section has at least one fragment, which simplifies MCAsmLayout::getSectionAddressSize (see e73353c7201a3080851d99a16f5fe2c17f7697c6 from 2010). It's better to ensure the condition is satisfied at create time (COFF, GOFF, Mach-O) to simplify more fragment processing.	2024-06-23 10:08:52 -07:00
Fangrui Song	c9f6a5e495	[MC] Move computeBundlePadding closer to its only caller. NFC There is only one caller after #95188.	2024-06-22 13:28:07 -07:00
Fangrui Song	7500646629	[MC] Remove pending labels This commit removes the complexity introduced by pending labels in https://reviews.llvm.org/D5915 by using a simpler approach. D5915 aimed to ensure padding placement before `.Ltmp0` for the following code, but at the cost of expensive per-instruction `flushPendingLabels`. ``` // similar to llvm/test/MC/X86/AlignedBundling/labeloffset.s .bundle_lock align_to_end calll .L0$pb .bundle_unlock .L0$pb: popl %eax .Ltmp0: //// padding should be inserted before this label instead of after addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax ``` (D5915 was adjusted by https://reviews.llvm.org/D8072 and https://reviews.llvm.org/D71368) This patch achieves the same goal by setting the offset of the empty MCDataFragment (`Prev`) in `layoutBundle`. This eliminates the need for pending labels and simplifies the code. llvm/test/MC/MachO/pending-labels.s (D71368): relocation symbols are changed, but the result is still supported by linkers.	2024-06-22 00:34:16 -07:00
Fangrui Song	87424778ef	[MC] Remove the Parent parameter from MCFragment ctor callers. NFC	2024-06-21 22:47:55 -07:00
Fangrui Song	b1932b8483	[MC] Aligned bundling: remove special handling for RelaxAll When both aligned bundling and RelaxAll are enabled, bundle padding is directly written into fragments (https://reviews.llvm.org/D8072). (The original motivation was memory usage, which has been achieved from different angles with recent assembler improvement). The code presents challenges with the work to replace fragment representation (e.g. #94950 #95077). This patch removes the special handling. RelaxAll still works but the behavior seems slightly different as revealed by 2 changed tests. However, most `-mc-relax-all` tests are unchanged. RelaxAll used to be the default for clang -O0. This mode has significant code size drawbacks and newer Clang doesn't use it (#90013). --- flushPendingLabels: The FOffset parameter can be removed: pending labels will be assigned to the incoming fragment at offset 0. Pull Request: https://github.com/llvm/llvm-project/pull/95188	2024-06-14 10:01:36 -07:00
Fangrui Song	f808abf508	[MC] Add MCFragment allocation helpers `allocFragment` might be changed to a placement new when the allocation strategy changes. `allocInitialFragment` is to deduplicate the following pattern ``` auto F = new MCDataFragment(); Result->addFragment(F); F->setParent(Result); ``` Pull Request: https://github.com/llvm/llvm-project/pull/95197	2024-06-14 09:39:32 -07:00
Fangrui Song	de19f7b6d4	[MC] Replace fragment ilist with singly-linked lists Fragments are allocated with `operator new` and stored in an ilist with Prev/Next/Parent pointers. A more efficient representation would be an array of fragments without the overhead of Prev/Next pointers. As the first step, replace ilist with singly-linked lists. * `getPrevNode` uses have been eliminated by previous changes. * The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and the current insertion point is stored at `CurInsertionPoint`. * `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all fragments within `Frags`. Hexagon programs are usually small, and the performance does not matter that much. To eliminate `Prev`, change the subsection representation to singly-linked lists for subsections and a pointer to the active singly-linked list. The fragments from all subsections will be chained together at layout time. Since fragment lists are disconnected before layout time, we can remove `MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The current implementation of `AttemptToFoldSymbolOffsetDifference` requires future improvement for robustness. Pull Request: https://github.com/llvm/llvm-project/pull/95077	2024-06-11 09:18:31 -07:00
Fangrui Song	cb63abca27	[MC] Remove getFragmentList uses. NFC	2024-06-10 18:27:34 -07:00
Fangrui Song	bb4ee27a31	[MC] Remove the last MCFragment::getPrevNode use. NFC	2024-06-09 23:14:15 -07:00
Fangrui Song	9d0754ada5	[MC] Relax fragments eagerly Lazy relaxation caused hash table lookups (`getFragmentOffset`) and complex use/compute interdependencies. Some expressions involding forward declared symbols (e.g. `subsection-if.s`) cannot be computed. Recursion detection requires complex `IsBeingLaidOut` (https://reviews.llvm.org/D79570). D76114's `invalidateFragmentsFrom` makes lazy relaxation even less useful. Switch to eager relaxation to greatly simplify code and resolve these issues. This change also removes a `getPrevNode` use, which makes it more feasible to replace the fragment representation, which might yield a large peak RSS win. Minor downsides: The number of section relaxations may increase (offset by avoiding the hash table lookup). For relax-recompute-align.s, the computed layout is not optimal.	2024-06-09 23:05:05 -07:00
Amir Ayupov	9fec33aadc	Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653 )" This reverts commit 82bc33ea3f1a539be50ed46919dc53fc6b685da9. Accidentally pushed unrelated changes.	2024-01-18 19:59:09 -08:00
Amir Ayupov	82bc33ea3f	[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653 ) Fix the bug where merge-fdata unconditionally outputs boltedcollection line, regardless of whether input files have it set. Test Plan: Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this fix.	2024-01-18 19:44:16 -08:00
Jinyang He	b57159cb19	[LoongArch] Support R_LARCH_{ADD,SUB}_ULEB128 for .uleb128 and force relocs when sym is not in section (#76433 ) 1, Follow RISCV 1df5ea29 to support generates relocs for .uleb128 which can not be folded. Unlike RISCV, the located content of LoongArch should be zero. LoongArch fixup uleb128 value by in-place addition and subtraction reloc types named R_LARCH_{ADD,SUB}_ULEB128. The located content can affect the result and R_LARCH_ADD_ULEB128 has enough info to represent the first symbol value, so it needs to be set to zero. 2, Force relocs if sym is not in section so that it can emit relocs for external symbol. Fixes: https://github.com/llvm/llvm-project/pull/72960#issuecomment-1866844679	2024-01-09 15:14:54 +08:00
Craig Topper	e87f33d9ce	[RISCV][MC] Pass MCSubtargetInfo down to shouldForceRelocation and evaluateTargetFixup. (#73721 ) Instead of using the STI stored in RISCVAsmBackend, try to get it from the MCFragment. This addresses the issue raised here https://discourse.llvm.org/t/possible-problem-related-to-subtarget-usage/75283	2023-12-07 13:17:58 -08:00
Fangrui Song	1df5ea29b4	[RISCV] Support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 for .uleb128 directives For a label difference like `.uleb128 A-B`, MC folds A-B even if A and B are separated by a RISC-V linker-relaxable instruction. This incorrect behavior is currently abused by DWARF v5 .debug_loclists/.debug_rnglists (DW_LLE_offset_pair/DW_RLE_offset_pair entry kinds) implemented in Clang/LLVM (see https://github.com/ClangBuiltLinux/linux/issues/1719 for an instance). `96d6e190e9` defined R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128. This patch generates such a pair of relocations to represent A-B that should not be folded. GNU assembler computes the directive size by ignoring shrinkable section content, therefore after linking the value of A-B cannot use more bytes than the reserved number (`final size of uleb128 value at offset ... exceeds available space`). We make the same assumption. ``` w1: call foo w2: .space 120 w3: .uleb128 w2-w1 # 1 byte, 0x08 .uleb128 w3-w1 # 2 bytes, 0x80 0x01 ``` We do not conservatively reserve 10 bytes (maximum size of an uleb128 for uint64_t) as that would pessimize DWARF v5 DW_LLE_offset_pair/DW_RLE_offset_pair, nullifying the benefits of introducing R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 relocations. The supported expressions are limited. For example, * non-subtraction `.uleb128 A` is not allowed * `.uleb128 A-B`: report an error unless A and B are both defined and in the same section The new cl::opt `-riscv-uleb128-reloc` can be used to suppress the relocations. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D157657	2023-11-09 09:27:32 -08:00
Kazu Hirata	4a0ccfa865	Use llvm::endianness::{big,little,native} (NFC) Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class as opposed to an enum. This patch replaces support::{big,little,native} with llvm::endianness::{big,little,native}.	2023-10-12 21:21:45 -07:00
Kazu Hirata	a9d5056862	Use llvm::endianness (NFC) Now that llvm::support::endianness has been renamed to llvm::endianness, we can use the shorter form. This patch replaces support::endianness with llvm::endianness.	2023-10-10 21:54:15 -07:00
Fangrui Song	d7398a3554	[MC] Use reportError for .uleb128/.sleb128 diagnostic User errors should use reportError. reportError allows us to continue parsing the file and collect more diagnostics. MC/ELF/leb128-err.s is adapted from MC/RISCV/riscv64-64b-pcrel.s	2023-08-09 17:32:09 -07:00
Fangrui Song	ffa829c4c5	[RISCV] Allow delayed decision for ADD/SUB relocations For a label difference `A-B` in assembly, if A and B are separated by a linker-relaxable instruction, we should emit a pair of ADD/SUB relocations (e.g. R_RISCV_ADD32/R_RISCV_SUB32, R_RISCV_ADD64/R_RISCV_SUB64). However, the decision is made upfront at parsing time with inadequate heuristics (`requiresFixup`). As a result, LLVM integrated assembler incorrectly suppresses R_RISCV_ADD32/R_RISCV_SUB32 for the following code: ``` // Simplified from a workaround https://android-review.googlesource.com/c/platform/art/+/2619609 // Both end and begin are not defined yet. We decide ADD/SUB relocations upfront and don't know they will be needed. .4byte end-begin begin: call foo end: ``` To fix the bug, make two primary changes: * Delete `requiresFixups` and the overridden emitValueImpl (from D103539). This deletion requires accurate evaluateAsAbolute (D153097). * In MCAssembler::evaluateFixup, call handleAddSubRelocations to emit ADD/SUB relocations. However, there is a remaining issue in MCExpr.cpp:AttemptToFoldSymbolOffsetDifference. With MCAsmLayout, we may incorrectly fold A-B even when A and B are separated by a linker-relaxable instruction. This deficiency is acknowledged (see D153097), but was previously bypassed by eagerly emitting ADD/SUB using `requiresFixups`. To address this, we partially reintroduce `canFold` (from D61584, removed by D103539). Some expressions (e.g. .size and .fill) need to take the `MCAsmLayout` code path in AttemptToFoldSymbolOffsetDifference, avoiding relocations (weird, but matching GNU assembler and needed to match user expectation). Switch to evaluateKnownAbsolute to leverage the `InSet` condition. As a bonus, this change allows for the removal of some relocations for the FDE `address_range` field in the .eh_frame section. riscv64-64b-pcrel.s contains the main test. Add a linker relaxable instruction to dwarf-riscv-relocs.ll to test what it intends to test. Merge fixups-relax-diff.ll into fixups-diff.ll. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D155357	2023-07-21 08:37:58 -07:00
Fangrui Song	f9b14f0b7c	[MC] Report location information for MCDwarfCallFrameFragment diagnostics	2023-06-26 18:02:22 -07:00
Fangrui Song	0b0672773e	[MC] Reject CFI advance_loc separated by a non-private label for Mach-O Due to Mach-O's .subsections_via_symbols mechanism, non-private labels cannot appear between .cfi_startproc/.cfi_endproc. Compilers do not produce such labels, but hand-written assembly may. Give an error. Unfortunately, emitDwarfAdvanceFrameAddr generated MCExpr doesn't have location informatin. Note: evaluateKnownAbsolute is to force folding A-B to a constant even if A and B are separate by a non-private label. The function is a workaround for some Mach-O assembler issues and should generally be avoided. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D153167	2023-06-26 14:26:06 -07:00
Daniel Hoekwater	d8cef4f8fa	[MC] Detect out of range jumps further than 2^32 bytes On AArch64, object files may be greater than 2^32 bytes. If an offset is greater than the max value of a 32-bit unsigned integer, LLVM silently truncates the offset. Instead, make it return an error. Differential Revision: https://reviews.llvm.org/D153494	2023-06-22 21:56:22 +00:00
Volodymyr Sapsai	e872e162a3	Revert "[RISCV] relaxDwarfCallFrameFragment: remove unneeded relocations for relaxation" Failing buildbot https://green.lab.llvm.org/green/job/clang-stage1-RA/34684/ This reverts commit 11ebe3d906558d93a607347de472e7718127f409.	2023-06-16 12:01:54 -07:00
Fangrui Song	11ebe3d906	[RISCV] relaxDwarfCallFrameFragment: remove unneeded relocations for relaxation If `evaluateAsAbsolute(Value, Layout.getAssembler())` returns true, we know the address delta is a constant and can suppress relocations (usually SET6/SUB6). While here, replace two evaluateKnownAbsolute calls (subtle; avoid if possible) with evaluateAsAbsolute.	2023-06-15 23:26:25 -07:00
Fangrui Song	390643e3c5	MCDwarfFrameEmitter::EncodeAdvanceLoc: use SmallVectorImpl instead of raw_ostream. NFC Similar to 49488490d195591bfc90daef111cd7293f8c80aa. Remove MCDwarfFrameEmitter::EmitAdvanceLoc which is only called once.	2023-05-07 19:32:53 -07:00
Fangrui Song	49488490d1	[MC] MCDwarfLineAddr::Encode: use SmallVectorImpl instead of raw_ostream. NFC Similar to D145791: most call sites need a SmallString, but have to provide a raw_svector_ostream wrapper with unneeded abstraction and overhead: raw_ostream::write =(inlinable)=> flush_tied_then_write (unneeded TiedStream check) =(virtual function call)=> raw_svector_ostream::write_impl ==> SmallVector append(ItTy in_start, ItTy in_end) (range; less efficient then push_back). Just use SmallVectorImpl to simplify and optimize code. Unfortunately most call sites use SmallString, so we have to use SmallVectorImpl<char> instead of <uint8_t> to avoid large refactoring.	2023-05-07 16:26:52 -07:00
Fangrui Song	a08cbabb28	[MC] Optimize relaxInstruction: remove SmallVector copy. NFC	2023-05-04 23:34:35 -07:00
Fangrui Song	01260bbc6b	[MC] registerSymbol: change an output paramter to return value	2023-05-04 22:17:56 -07:00
Alexis Engelke	0c049ea60a	[MC] Always encode instruction into SmallVector All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs some overhead for the actual encoding. This change allows an MCCodeEmitter to directly emit an instruction into a SmallVector without using a raw_ostream and therefore allow for performance improvments in encoding. A default path that uses existing raw_ostream implementations is provided. Reviewed By: MaskRay, Amir Differential Revision: https://reviews.llvm.org/D145791	2023-04-06 16:21:49 +02:00
spupyrev	eecd41aa09	Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type" This reverts commit 6d0528636ae54fba75938a79ae7a98dfcc949f72.	2022-07-11 09:50:47 -07:00
Rafael Auler	6d0528636a	Rebase: [Facebook] [MC] Introduce NeverAlign fragment type Summary: Introduce NeverAlign fragment type. The intended usage of this fragment is to insert it before a pair of macro-op fusion eligible instructions. NeverAlign fragment ensures that the next fragment (first instruction in the pair) does not end at a given alignment boundary by emitting a minimal size nop if necessary. In effect, it ensures that a pair of macro-fusible instructions is not split by a given alignment boundary, which is a precondition for macro-op fusion in modern Intel Cores (64B = cache line size, see Intel Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode Pipeline: Macro-Fusion). This patch introduces functionality used by BOLT when emitting code with MacroFusion alignment already in place. The use case is different from BoundaryAlign and instruction bundling: - BoundaryAlign can be extended to perform the desired alignment for the first instruction in the macro-op fusion pair (D101817). However, this approach has higher overhead due to reliance on relaxation as BoundaryAlign requires in the general case - see https://reviews.llvm.org/D97982#2710638. - Instruction bundling: the intent of NeverAlign fragment is to prevent the first instruction in a pair ending at a given alignment boundary, by inserting at most one minimum size nop. It's OK if either instruction crosses the cache line. Padding both instructions using bundles to not cross the alignment boundary would result in excessive padding. There's no straightforward way to request instruction bundling to avoid a given end alignment for the first instruction in the bundle. LLVM: https://reviews.llvm.org/D97982 Manual rebase conflict history: https://phabricator.intern.facebook.com/D30142613 Test Plan: sandcastle Reviewers: #llvm-bolt Subscribers: phabricatorlinter Differential Revision: https://phabricator.intern.facebook.com/D31361547	2022-07-11 09:31:52 -07:00
Guillaume Chatelet	412c788ab0	[NFC][Alignment] Use Align in MCAlignFragment	2022-06-15 12:31:00 +00:00
Fangrui Song	689c3a2552	[MC] Fix letter case of some MCSection member functions	2022-03-11 20:07:00 -08:00
serge-sans-paille	ef736a1c39	Cleanup LLVMMC headers There's a few relevant forward declarations in there that may require downstream adding explicit includes: llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h Counting preprocessed lines required to rebuild llvm-project on my setup: before: 1052436830 after: 1049293745 Which is significant and backs up the change in addition to the usual benefits of decreasing coupling between headers and compilation units. Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119244	2022-02-09 11:09:17 +01:00
Alex Lorenz	0756aa3978	[macho] add support for emitting macho files with two build version load commands This patch extends LLVM IR to add metadata that can be used to emit macho files with two build version load commands. It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that, which will be set by a future patch in clang. MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target, and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support. Differential Revision: https://reviews.llvm.org/D112189	2021-12-07 18:17:47 -08:00
Peter Smith	e63455d5e0	[MC] Use local MCSubtargetInfo in writeNops On some architectures such as Arm and X86 the encoding for a nop may change depending on the subtarget in operation at the time of encoding. This change replaces the per module MCSubtargetInfo retained by the targets AsmBackend in favour of passing through the local MCSubtargetInfo in operation at the time. On Arm using the architectural NOP instruction can have a performance benefit on some implementations. For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to limit the chances of this causing problems in the future. I've not done this for other targets such as X86 as there is more frequent use of the MCSubtargetInfo and it looks to be for stable properties that we would not expect to vary per function. This change required threading STI through MCNopsFragment and MCBoundaryAlignFragment. I've attempted to take into account the in tree experimental backends. Differential Revision: https://reviews.llvm.org/D45962	2021-09-07 15:46:19 +01:00
Saleem Abdulrasool	bbea64250f	RISCV: adjust handling of relocation emission for RISCV This re-architects the RISCV relocation handling to bring the implementation closer in line with the implementation in binutils. We would previously aggressively resolve the relocation. With this restructuring, we always will emit a paired relocation for any symbolic difference of the type of S±T[±C] where S and T are labels and C is a constant. GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE` which indicates that a fixup may be expanded into multiple relocations. This is used by the RISCV backend to always emit a paired relocation - either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] + SUB[WIDTH] for a debug info relocation. Irrespective of whether linker relaxation support is enabled, symbolic difference is always emitted as a paired relocation. This change also sinks the target specific behaviour down into the target specific area rather than exposing it to the shared relocation handling. In the process, we also sink the "special" handling for debug information down into the RISCV target. Although this improves the path for the other targets, this is not necessarily entirely ideal either. The changes in the debug info emission could be done through another type of hook as this functionality would be required by any other target which wishes to do linker relaxation. However, as there are no other targets in LLVM which currently do this, this is a reasonable thing to do until such time as the code needs to be shared. Improve the handling of the relocation (and add a reduced test case from the Linux kernel) to ensure that we handle complex expressions for symbolic difference. This ensures that we correct relocate symbols with the adddends normalized and associated with the addition portion of the paired relocation. This change also addresses some review comments from Alex Bradbury about the relocations meant for use in the DWARF CFA being named incorrectly (using ADD6 instead of SET6) in the original change which introduced the relocation type. This resolves the issues with the symbolic difference emission sufficiently to enable building the Linux kernel with clang+IAS+lld (without linker relaxation). Resolves PR50153, PR50156! Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143 Reviewed By: nickdesaulniers, maskray Differential Revision: https://reviews.llvm.org/D103539	2021-06-17 08:20:02 -07:00
Fangrui Song	71635ea5ff	MCDwarf: Delete uneeded parameter And change signature	2021-01-21 00:55:07 -08:00
Hongtao Yu	705a4c149d	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 17:29:28 -08:00
Mitch Phillips	7ead5f5aa3	Revert "[CSSPGO] Pseudo probe encoding and emission." This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1. Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/2269	2020-12-10 15:53:39 -08:00
Hongtao Yu	b035513c06	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 09:50:08 -08:00
Fangrui Song	35be65cb1c	[MC] Fix an assert in MCAssembler::writeSectionData to be aware of errors If MCContext has an error, MCAssembler::layout may stop early and some MCFragment's may not finalize. In the Linux kernel, arch/x86/lib/memcpy_64.S could trigger the assert before "x86_64: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S"	2020-10-29 23:11:18 -07:00
Fangrui Song	45d0343900	[MC] Allow .org directives in SHT_NOBITS sections This is used by kvm-unit-tests and can be trivially supported.	2020-09-11 15:12:42 -07:00
Jian Cai	c6334db577	[X86] support .nops directive Add support of .nops on X86. This addresses llvm.org/PR45788. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D82826	2020-08-03 11:50:56 -07:00
Shengchen Kan	e59e39b7c4	[MC] Simplify the logic of applying fixup for fragments, NFCI Replace mutiple `if else` clauses with a `switch` clause and remove redundant checks. Before this patch, we need to add a statement like `if(!isa<MCxxxFragment>(Frag)) ` here each time we add a new kind of `MCEncodedFragment` even if it has no fixups. After this patch, we don't need to do that. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D83366	2020-07-09 16:39:13 +08:00
Thomas Preud'homme	6c67ee0f58	[MC] Fix PR45805: infinite recursion in assembler Give up folding an expression if the fragment of one of the operands would require laying out a fragment already being laid out. This prevents hitting an infinite recursion when a fill size expression refers to a later fragment since computing the offset of that fragment would require laying out the fill fragment and thus computing its size expression. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D79570	2020-06-25 15:42:36 +01:00
Shengchen Kan	8bb059ab63	[MC][Bugfix] Remove redundant parameter for relaxInstruction Summary: Before this patch, `relaxInstruction` takes three arguments, the first argument refers to the instruction before relaxation and the third argument is the output instruction after relaxation. There are two quite strange things: 1) The first argument's type is `const MCInst &`, the third argument's type is `MCInst &`, but they may be aliased to the same variable 2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third argument is a fresh uninitialized `MCInst` even if `relaxInstruction` may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a loop. In this patch, we drop the thrid argument, and let `relaxInstruction` directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon. Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain Reviewed By: Razer6, MaskRay, bcain Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78364	2020-04-21 11:06:55 +08:00
Simon Pilgrim	bcd7f77713	MCObjectWriter.h - remove Endian.h/EndianStream.h/raw_ostream.h includes. NFC Push these includes down to the the writers that actually need them, a number of which were implicitly relying on the MCObjectWriter.h.	2020-04-17 10:44:08 +01:00
Fangrui Song	e13a8a1fc5	[MC][COFF][ELF] Reject instructions in IMAGE_SCN_CNT_UNINITIALIZED_DATA/SHT_NOBITS sections For `.bss; nop`, MC inappropriately calls abort() (via report_fatal_error()) with a message `cannot have fixups in virtual section!` It is a bug to crash for invalid user input. Fix it by erroring out early in EmitInstToData(). Similarly, emitIntValue() in a virtual section (SHT_NOBITS in ELF) can crash with the mssage `non-zero initializer found in section '.bss'` (see D4199) It'd be nice to report the location but so many directives can call emitIntValue() and it is difficult to track every location. Note, COFF does not crash because MCAssembler::writeSectionData() is not called for an IMAGE_SCN_CNT_UNINITIALIZED_DATA section. Note, GNU as' arm64 backend reports ``Error: attempt to store non-zero value in section `.bss'`` for a non-zero .inst but fails to do so for other instructions. We simply reject all instructions, even if the encoding is all zeros. The Mach-O counterpart is D48517 (see `test/MC/MachO/zerofill-text.s`) Reviewed By: rnk, skan Differential Revision: https://reviews.llvm.org/D78138	2020-04-15 21:02:47 -07:00

1 2 3 4 5 ...

475 Commits