llvm-project

Author	SHA1	Message	Date
Sam Elliott	9e8f7acd2b	[RISCV] Track Linker Relaxable through Assembly Relaxation (#152602 ) Span-dependent instructions on RISC-V interact in a complex manner with linker relaxation. The span-dependent assembler algorithm implemented in LLVM has to start with the smallest version of an instruction and then only make it larger, so we compress instructions before emitting them to the streamer. When the instruction is streamed, the information that the instruction (or rather, the fixup on the instruction) is linker relaxable must be accurate, even though the assembler relaxation process may transform a not-linker-relaxable instruction/fixup into one that that is linker relaxable, for instance `c.jal` becoming `qc.e.jal`, or `bne` getting turned into `beq; jal` (the `jal` is linker relaxable). In order for this to work, the following things have to happen: - Any instruction/fixup which might be relaxed to a linker-relaxable instruction/fixup, gets marked as `RelaxCandidate = true` in RISCVMCCodeEmitter. - In RISCVAsmBackend, when emitting the `R_RISCV_RELAX` relocation, we have to check that the relocation/fixup kind is one that may need a relax relocation, as well as that it is marked as linker relaxable (the latter will not be set if relaxation is disabled). - Linker Relaxable instructions streamed to a Relaxable fragment need to mark the fragment and its section as linker relaxable. I also added more debug output for Sections/Fixups which are marked Linker Relaxable. This results in more relocations, when these PC-relative fixups cross an instruction with a fixup that is resolved as not linker-relaxable but caused the fragment to be marked linker relaxable at streaming time (i.e. `c.j`). Fixes: #150071	2025-08-12 09:02:48 +01:00
Fangrui Song	f517ac2083	MCSectionCOFF: Avoid cast The object file format specific derived classes are used in context like MCStreamer and MCObjectTargetWriter where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSection::SectionVariant in the base class.	2025-07-26 10:04:04 -07:00
Fangrui Song	fd6d6a7c8d	MC: Refactor FT_Align fragments when linker relaxation is enabled Previously, two MCAsmBackend hooks were used, with shouldInsertFixupForCodeAlign calling getWriter().recordRelocation directly, bypassing generic code. This patch: * Introduces MCAsmBackend::relaxAlign to replace the two hooks. * Tracks padding size using VarContentEnd (content is ignored). * Move setLinkerRelaxable from MCObjectStreamer::emitCodeAlignment to the backends. Pull Request: https://github.com/llvm/llvm-project/pull/149465	2025-07-20 00:55:54 -07:00
Fangrui Song	2ba5e0ad17	MC: Encode FT_Align in fragment's variable-size tail Follow-up to #148544 Pull Request: https://github.com/llvm/llvm-project/pull/149030	2025-07-20 00:46:51 -07:00
Fangrui Song	1fcf49a35c	MC: Replace FT_PseudoProbe with FT_LEB The fragment type introduced by https://reviews.llvm.org/D91878 is unnecessary and can be replaced with FT_LEB.	2025-07-19 10:15:25 -07:00
Fangrui Song	6d0f573535	MCFragment: Remove MCDataFragment/MCRelaxableFragment type aliases Follow-up to #148544	2025-07-15 22:14:39 -07:00
Fangrui Song	dc3a4c0fcf	MC: Restructure MCFragment as a fixed part and a variable tail Refactor the fragment representation of `push rax; jmp foo; nop; jmp foo`, previously encoded as `MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo)`, to ``` MCFragment(fixed: push rax, variable: jmp foo) MCFragment(fixed: nop, variable: jmp foo) ``` Changes: * Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment. * The new MCFragment contains a fixed-size content (similar to previous MCDataFragment) and an optional variable-size tail. * The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and FT_DwarfFrame, with plans to extend to other fragment types. dyn_cast/isa should be avoided for the converted fragment subclasses. * In `setVarFixups`, source fixup offsets are relative to the variable part's start. Stored fixup (in `FixupStorage`) offsets are relative to the fixed part's start. A lot of code does `getFragmentOffset(Frag) + Fixup.getOffset()`, expecting the fixup offset to be relative to the fixed part's start. * HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the associated instruction for a fixup. We have to add a `const MCFragment &` parameter. * In MCObjectStreamer, extend `absoluteSymbolDiff` to apply to FT_Relaxable as otherwise there would be many more FT_DwarfFrame fragments in -g compilations. https://llvm-compile-time-tracker.com/compare.php?from=28e1473e8e523150914e8c7ea50b44fb0d2a8d65&to=778d68ad1d48e7f111ea853dd249912c601bee89&stat=instructions:u ``` stage2-O0-g instructins:u geomeon (-0.07%) stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%) ``` ``` % /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &\| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}' Total 59675 Align 2215 Data 29700 Dwarf 12044 DwarfCallFrame 4216 Fill 92 LEB 12 Relaxable 11396 % /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &\| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}' Total 32287 Align 2215 Data 2312 Dwarf 12044 DwarfCallFrame 4216 Fill 92 LEB 12 Relaxable 11396 ``` Pull Request: https://github.com/llvm/llvm-project/pull/148544	2025-07-15 21:56:55 -07:00
Fangrui Song	28e1473e8e	MC: Remove bundle alignment mode The being-removed PNaCl has a Software Fault Isolation mechanism, which requires that certain instructions and groups of instructions do not cross a bundle boundary. When `.bundle_align_mode` is in effect, each instruction is placed in its own fragment, allowing flexible NOP padding. This feature has significantly complicated our refactoring of MCStreamer and MCFragment, leading to considerable effort spent untangling it (including flushPendingLabels (75006466296ed4b0f845cbbec4bf77c21de43b40), MCAssembler iteration improvement, and recent MCFragment refactoring). * Make MCObjectStreamer::emitInstToData non-virtual and delete MCELFStreamer::emitInstTodata * Delete MCELFStreamer::emitValueImpl and emitValueToAlignment Minor instructions:u decrease for both -O0 -g and -O3 builds https://llvm-compile-time-tracker.com/compare.php?from=c06d3a7b728293cbc53ff91239d6cd87c0982ffb&to=9b078c7f228bc5b6cdbfe839f751c9407f8aec3e&stat=instructions:u Pull Request: https://github.com/llvm/llvm-project/pull/148781	2025-07-15 19:36:19 -07:00
Fangrui Song	1fbfa333f6	MCAlignFragment: Rename fields and use uint8_t FillLen * Rename the vague `Value` to `Fill`. * FillLen is at most 8. Making the field smaller to facilitate encoding MCAlignFragment as a MCFragment union member. * Replace an unreachable report_fatal_error with assert.	2025-07-13 14:07:10 -07:00
Fangrui Song	9beb467d92	MC: Store fragment content and fixups out-of-line Moved `Contents` and `Fixups` SmallVector storage to MCSection, enabling trivial destructors for most fragment subclasses and eliminating the need for MCFragment::destroy in ~MCSection. For appending content to the current section, use getContentsForAppending. During assembler relaxation, prefer setContents/setFixups, which may involve copying and reduce the benefits of https://reviews.llvm.org/D145791. Moving only Contents out-of-line caused a slight performance regression (Alexis Engelke's 2024 prototype). By also moving Fragments out-of-line, fragment destructors become trivial, resulting in neglgible instructions:u increase for "stage2-O0-g" and [large max-rss decrease](https://llvm-compile-time-tracker.com/compare.php?from=84e82746c3ff63ec23a8b85e9efd4f7fccf92590&to=555a28c0b2f8250a9cf86fd267a04b0460283e15&stat=max-rss&linkStats=on) for the "stage1-ReleaseLTO-g (link only)" benchmark. ( An older version using fewer inline functions: https://llvm-compile-time-tracker.com/compare.php?from=bb982e733cfcda7e4cfb0583544f68af65211ed1&to=f12d55f97c47717d438951ecddecf8ebd28c296b&linkStats=on ) Now using plain SmallVector in MCSection for storage, with potential for future allocator optimizations, such as allocating `Contents` as the trailing object of MCDataFragment. (GNU Assembler uses gnulib's obstack for fragment management.) Co-authored-by: Alexis Engelke <engelke@in.tum.de> Pull Request: https://github.com/llvm/llvm-project/pull/146307	2025-07-01 00:21:12 -07:00
Fangrui Song	04395be630	MC: Merge MCFragment.h into MCSection.h ... due to their close relationship. MCSection's inline functions (e.g. iterator) access MCFragment, and we want MCFragment's inline functions to access MCSection similarly (#146307). Pull Request: https://github.com/llvm/llvm-project/pull/146315	2025-06-30 09:41:53 -07:00
Fangrui Song	b54337d76c	MC: Enhance mc-dump output * Make pre-layout to -debug-only=mc-dump-pre. This output is not useful for most debugging needs. * Print fragment-associated symbols. Make it easier to locate relevant fragments. * Print the LinkerRelaxable flag.	2025-06-29 00:11:24 -07:00
Fangrui Song	279e808b75	MC: Make mc-dump output compact Remove unneeded details like "<" and ">". Reduce indentation. Omit `this` address to simplify output comparison. Add a -debug-only=mc-dump test. While here, add fixup printing for MCRelaxableFragment.	2025-06-28 22:31:38 -07:00
Fangrui Song	e6b25288eb	MCExpr: Migrate away from operator<< Printing an expression is error-prone without a MCAsmInfo argument. Remove the operator<< overload and replace callers with MCAsmInfo::printExpr. Some callers are changed to MCExpr::print, with the goal of eventually making it private.	2025-06-28 14:41:58 -07:00
Fangrui Song	eedc72b45e	MCSection: Replace DummyFragment with the Subsections[0] head fragment The dummy fragment is primarily used by MCAsmStreamer::emitLabel to track the defined state. We can replace it with an arbitrary fragment. Remove MCDummyFragment introduced for https://github.com/llvm/llvm-project/issues/24860	2025-06-01 01:12:06 -07:00
Fangrui Song	c1931241b0	[MC] Remove fixup_begin/fixup_end	2024-12-22 00:01:42 -08:00
Kazu Hirata	d73d5c8c9b	[MC] Remove unused includes (NFC) (#116317 ) Identified with misc-include-cleaner.	2024-11-15 07:26:22 -08:00
Fangrui Song	fb7028237b	[MC] Move some bool members to MCFragment. NFC Move `AllowAutoPadding` to MCFragment, which reduce the MCRelaxableFragment size by 8 bytes. While here, also move `AlignToBundleEnd` next to `HasInstructions`. Functions that create fragments are slightly shorter due to fewer byte zeroing instructions. Although fewer in number than MCDataFragments, MCRelaxableFragment objects still constitute a significant proportion warranting optimization. ``` % clang -c sqlite3.i -w -g -Xclang -print-stats ... 2206 assembler - Number of emitted assembler fragments - align 83980 assembler - Number of emitted assembler fragments - data 84 assembler - Number of emitted assembler fragments - fill 169462 assembler - Number of emitted assembler fragments - total 11396 assembler - Number of emitted assembler fragments - relaxable ``` Pull Request: https://github.com/llvm/llvm-project/pull/100976	2024-07-29 11:51:51 -07:00
Fangrui Song	de5aa8d006	[MC] Remove unused MCCompactEncodedInstFragment This has been used after #94950.	2024-07-28 22:35:56 -07:00
Fangrui Song	10c894cffd	[MC] Move MCAsmLayout from MCFragment.cpp to MCAssembler.cpp. NFC 8d736236d36ca5c98832b7631aea2e538f6a54aa (2015) moved these MCAsmLayout functions to MCFragment.cpp, but the original placement is better as these functions are tightly coupled with MCAssembler.cpp.	2024-06-30 14:22:25 -07:00
Fangrui Song	2afa193b3b	[MC] Remove MCAsmLayout::invalidateFragmentsFrom The simplification is enabled by 9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments eagerly").	2024-06-29 23:32:51 -07:00
Fangrui Song	c9f6a5e495	[MC] Move computeBundlePadding closer to its only caller. NFC There is only one caller after #95188.	2024-06-22 13:28:07 -07:00
Fangrui Song	8cb6e587fd	[MC] Allocate MCFragment with a bump allocator #95197 and 75006466296ed4b0f845cbbec4bf77c21de43b40 eliminated all raw `new MCXXXFragment`. We can now place fragments in a bump allocator. In addition, remove the dead `Kind == FragmentType(~0)` condition. ~CodeViewContext may call `StrTabFragment->destroy()` and need to be reset before `FragmentAllocator.Reset()`. Tested by llvm/test/MC/COFF/cv-compiler-info.ll using asan. Pull Request: https://github.com/llvm/llvm-project/pull/96402	2024-06-22 12:54:19 -07:00
Fangrui Song	82f9b0fbda	[MC] Remove Parent initializer from MCFragment ctor	2024-06-21 23:02:30 -07:00
Fangrui Song	9b44cfbdbf	[MC] Remove unused section parameters from MCFragment constructors	2024-06-21 22:57:12 -07:00
Fangrui Song	27588fe205	[MC] Move MCFragment::Atom to MCSectionMachO::Atoms Mach-O's `.subsections_via_symbols` mechanism associates a fragment with an atom (a non-temporary defined symbol). The current approach (`MCFragment::Atom`) wastes space for other object file formats. After #95077, `MCFragment::LayoutOrder` is only used by `AttemptToFoldSymbolOffsetDifference`. While it could be removed, we might explore future uses for `LayoutOrder`. @aengelke suggests one use case: move `Atom` into MCSection. This works because Mach-O doesn't support `.subsection`, and `LayoutOrder`, as the index into the fragment list, is unchanged. This patch moves MCFragment::Atom to MCSectionMachO::Atoms. `getAtom` may be called at parse time before `Atoms` is initialized, so a bound checking is needed to keep the hack working. Pull Request: https://github.com/llvm/llvm-project/pull/95341	2024-06-13 14:37:15 -07:00
aengelke	846e47e7b8	[MC] Reduce size of MCDataFragment by 8 bytes (#95293 ) Due to alignment, the first two fields of MCEncodedFragment are currently at bytes 40 and 41, so 1 byte over the 8 byte boundary, causing 7 bytes padding to be inserted for the following pointer. Fold two bools of MCFragment into bitfields to reduce move the two fields of MCEncodedFragment one byte earlier to remove the padding bytes. This works, as in the Itanium ABI, there is no padding after base classes. This gives a space reduction of MCDataFragment from 224 to 216 bytes.	2024-06-13 12:04:56 +02:00
Fangrui Song	2cc4bc132c	MCFragment: Initialize Offset to 0 After 9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments eagerly") removes the assert of Offset, it is no longer useful to initialize the member to -1. Now the symbol value estimate is more precise, which leads to slight behavior change to layout-interdependency.s.	2024-06-12 14:13:20 -07:00
Fangrui Song	de19f7b6d4	[MC] Replace fragment ilist with singly-linked lists Fragments are allocated with `operator new` and stored in an ilist with Prev/Next/Parent pointers. A more efficient representation would be an array of fragments without the overhead of Prev/Next pointers. As the first step, replace ilist with singly-linked lists. * `getPrevNode` uses have been eliminated by previous changes. * The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and the current insertion point is stored at `CurInsertionPoint`. * `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all fragments within `Frags`. Hexagon programs are usually small, and the performance does not matter that much. To eliminate `Prev`, change the subsection representation to singly-linked lists for subsections and a pointer to the active singly-linked list. The fragments from all subsections will be chained together at layout time. Since fragment lists are disconnected before layout time, we can remove `MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The current implementation of `AttemptToFoldSymbolOffsetDifference` requires future improvement for robustness. Pull Request: https://github.com/llvm/llvm-project/pull/95077	2024-06-11 09:18:31 -07:00
Fangrui Song	9d0754ada5	[MC] Relax fragments eagerly Lazy relaxation caused hash table lookups (`getFragmentOffset`) and complex use/compute interdependencies. Some expressions involding forward declared symbols (e.g. `subsection-if.s`) cannot be computed. Recursion detection requires complex `IsBeingLaidOut` (https://reviews.llvm.org/D79570). D76114's `invalidateFragmentsFrom` makes lazy relaxation even less useful. Switch to eager relaxation to greatly simplify code and resolve these issues. This change also removes a `getPrevNode` use, which makes it more feasible to replace the fragment representation, which might yield a large peak RSS win. Minor downsides: The number of section relaxations may increase (offset by avoiding the hash table lookup). For relax-recompute-align.s, the computed layout is not optimal.	2024-06-09 23:05:05 -07:00
Fangrui Song	dcb71c06c7	[MC] Simplify Sec.getFragmentList().insert(Sec.begin(), F). NFC Decrease the uses of getFragmentList() to make it easier to change the fragment list representation.	2024-06-08 15:51:44 -07:00
Amir Ayupov	9fec33aadc	Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653 )" This reverts commit 82bc33ea3f1a539be50ed46919dc53fc6b685da9. Accidentally pushed unrelated changes.	2024-01-18 19:59:09 -08:00
Amir Ayupov	82bc33ea3f	[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653 ) Fix the bug where merge-fdata unconditionally outputs boltedcollection line, regardless of whether input files have it set. Test Plan: Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this fix.	2024-01-18 19:44:16 -08:00
David Green	7850c94b86	[NFC] sentinal -> sentinel	2024-01-16 17:22:06 +00:00
spupyrev	eecd41aa09	Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type" This reverts commit 6d0528636ae54fba75938a79ae7a98dfcc949f72.	2022-07-11 09:50:47 -07:00
Rafael Auler	6d0528636a	Rebase: [Facebook] [MC] Introduce NeverAlign fragment type Summary: Introduce NeverAlign fragment type. The intended usage of this fragment is to insert it before a pair of macro-op fusion eligible instructions. NeverAlign fragment ensures that the next fragment (first instruction in the pair) does not end at a given alignment boundary by emitting a minimal size nop if necessary. In effect, it ensures that a pair of macro-fusible instructions is not split by a given alignment boundary, which is a precondition for macro-op fusion in modern Intel Cores (64B = cache line size, see Intel Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode Pipeline: Macro-Fusion). This patch introduces functionality used by BOLT when emitting code with MacroFusion alignment already in place. The use case is different from BoundaryAlign and instruction bundling: - BoundaryAlign can be extended to perform the desired alignment for the first instruction in the macro-op fusion pair (D101817). However, this approach has higher overhead due to reliance on relaxation as BoundaryAlign requires in the general case - see https://reviews.llvm.org/D97982#2710638. - Instruction bundling: the intent of NeverAlign fragment is to prevent the first instruction in a pair ending at a given alignment boundary, by inserting at most one minimum size nop. It's OK if either instruction crosses the cache line. Padding both instructions using bundles to not cross the alignment boundary would result in excessive padding. There's no straightforward way to request instruction bundling to avoid a given end alignment for the first instruction in the bundle. LLVM: https://reviews.llvm.org/D97982 Manual rebase conflict history: https://phabricator.intern.facebook.com/D30142613 Test Plan: sandcastle Reviewers: #llvm-bolt Subscribers: phabricatorlinter Differential Revision: https://phabricator.intern.facebook.com/D31361547	2022-07-11 09:31:52 -07:00
Guillaume Chatelet	412c788ab0	[NFC][Alignment] Use Align in MCAlignFragment	2022-06-15 12:31:00 +00:00
Leonard Grey	5d57578a4e	[MC] Recursively calculate symbol offset This is speculative since I'm not sure if there's some implicit contract that a variable symbol must not have another variable symbol in its evaluation tree. Downstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=471146#c23. Test is based on alias.s (removed checks since we just need to know it didn't crash). Differential Revision: https://reviews.llvm.org/D109109	2021-10-20 14:29:43 -04:00
Hongtao Yu	705a4c149d	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 17:29:28 -08:00
Mitch Phillips	7ead5f5aa3	Revert "[CSSPGO] Pseudo probe encoding and emission." This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1. Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/2269	2020-12-10 15:53:39 -08:00
Hongtao Yu	b035513c06	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 09:50:08 -08:00
Jian Cai	c6334db577	[X86] support .nops directive Add support of .nops on X86. This addresses llvm.org/PR45788. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D82826	2020-08-03 11:50:56 -07:00
Thomas Preud'homme	6c67ee0f58	[MC] Fix PR45805: infinite recursion in assembler Give up folding an expression if the fragment of one of the operands would require laying out a fragment already being laid out. This prevents hitting an infinite recursion when a fill size expression refers to a later fragment since computing the offset of that fragment would require laying out the fill fragment and thus computing its size expression. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D79570	2020-06-25 15:42:36 +01:00
Philip Reames	b4c8608eba	Adjust debug output for MCRelaxableFragment to include the size so that sanity checking relaxation offsets from -debug output is easier	2020-03-13 16:22:46 -07:00
Shengchen Kan	3a503ce663	[X86] Reduce the number of emitted fragments due to branch align Summary: Currently, a BoundaryAlign fragment may be inserted after the branch that needs to be aligned to truncate the current fragment, this fragment is unused at most of time. To avoid that, we can insert a new empty Data fragment instead. Non-relaxable instruction is usually emitted into Data fragment, so the inserted empty Data fragment will be reused at a high possibility. Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: reames, LuoYuanke Subscribers: llvm-commits, dexonsmith, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D75438	2020-03-12 15:37:35 +08:00
Shengchen Kan	af57b139a0	Temporarily Revert [X86] Not track size of the boudaryalign fragment during the layout Summary: This reverts commit 2ac19feb1571960b8e1479a451b45ab56da7034e. This commit causes some test cases to run fail when branch is aligned.	2020-03-03 11:15:56 +08:00
Shengchen Kan	2ac19feb15	[X86] Not track size of the boudaryalign fragment during the layout Summary: Currently the boundaryalign fragment caches its size during the process of layout and then it is relaxed and update the size in each iteration. This behaviour is unnecessary and ugly. Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: MaskRay Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75404	2020-03-02 09:32:30 +08:00
Fangrui Song	806a2b1f3d	[MC] Reorder MCFragment members to decrease padding sizeof(MCFragment) does not change, but some if its subclasses do, e.g. on a 64-bit platform, sizeof(MCEncodedFragment) decreases from 64 to 56, sizeof(MCDataFragment) decreases from 224 to 216.	2020-01-05 19:09:40 -08:00
Fangrui Song	2c053109fa	[MC] Delete MCFragment::isDummy. NFC isa<...>, dyn_cast<...> and cast<...> are used by other fragments. Don't make MCDummyFragment special.	2020-01-05 18:49:47 -08:00
Shengchen Kan	70fa4c4f88	[NFC] Style cleanups 1. Remove duplicate function for class name at the beginning of the comment. 2. Use auto where the type is already obvious from the context.	2019-12-23 17:02:36 +08:00

1 2

83 Commits