275 Commits

Author SHA1 Message Date
Maksim Panchenko
c9b1f06288 [BOLT] Introduce MetadataRewriter interface
Introduce the MetadataRewriter interface to handle updates for various
types of auxiliary data stored in a binary file.

To implement metadata processing using this new interface, all metadata
rewriters should derive from the RewriterBase class and implement
one or more of the following methods, depending on the timing of metadata
read and write operations:

  * preCFGInitializer()
  * postCFGInitializer() // TBD
  * preEmitFinalizer()   // TBD
  * postEmitFinalizer()

By adopting this approach, we aim to simplify the RewriteInstance class
and improve its scalability to accommodate new extensions of file formats,
including various metadata types of the Linux Kernel.

Differential Revision: https://reviews.llvm.org/D154020
2023-07-06 11:09:51 -07:00
Amir Ayupov
fd49cc87d0 [BOLT][NFC] Print functions after attaching profile (-print-profile)
Add an extra point of dumping functions: immediately after attaching the profile information.
This dumping is enabled by newly introduced `-print-profile` and `-print-all`.

The reason is that in `aggregate-only`/perf2bolt mode BOLT may not reach the point of
printing the function after CFG is constructed (`-print-cfg`), while we may still want to inspect
the attached profile, especially for diff'ing purposes.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D153996
2023-06-28 17:51:17 -07:00
Shatian Wang
a89c9b35be [BOLT] Fixing relative ordering of cold sections under multi-way function splitting
Order code sections with names in the form of ".text.cold.i" based on the value of i

[Context] SplitFunctions.cpp implements splitting strategies that can potentially split each function into maximum N>2 fragments.
When such N-way splitting happens, new code sections with names ".text.cold.1", ..., ".text.cold.i", ... "text.cold.N-2" will be created
A section with name ".text.cold.i" contains the the (i+2)th fragment of each function.
As an example, if each function is splitted into N=3 fragments: hot, warm, cold, then code sections will now include
- a section with name ".text" containing hot fragments
- a section with name ".text.cold" containing warm fragments
- a section with name ".text.cold.1" containing cold fragments

The order of these new sections in the output binary currently depends on the order in which they are encountered by the emitter.
For example, under N=3-way splitting, if the first function is 2-way splitted into hot and cold and the second function is 3-way splitted into hot, warm, and cold
then the cold fragment is encountered first, resulting in the final section to be in the following order
.text (hot), .text.cold.1 (cold), .text.cold (warm)

The above is suboptimal because the distance of jumps/calls between the hot and the warm sections will be much bigger than when ordering the sections as follows
.text (hot), .text.cold (warm), .text.cold.1 (cold)

This diff orders the sections with names in the form of ".text.cold" or ".text.cold.i" based on the value of i (assuming the i-value of ".text.cold" is 0).

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D152941
2023-06-22 14:26:48 -07:00
Job Noorman
38ba2824c8 [BOLT] Don't register internal func relocs as external references
Currently, all relocations that point inside a function are registered
as external references. If these relocations cannot be resolved as jump
tables or computed gotos, the containing function gets marked as
not-simple and excluded from optimizations.

RISC-V uses relocations for branches and jumps (to support linker
relaxation) and as such, almost no functions get marked as simple. This
patch fixes this by only registering relocations that originate outside
of the referenced function as external references.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D153345
2023-06-22 09:35:54 +02:00
Job Noorman
b410d24a19 [BOLT][RISCV] Implement R_RISCV_ADD32/SUB32
Thispatch implements the R_RISCV_ADD32 and R_RISCV_SUB32 relocations for
RISC-V.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D146554
2023-06-22 09:35:54 +02:00
Amir Ayupov
82ef86c194 [BOLT] Set IsRelro section attribute based on PT_GNU_RELRO segment
Handle PT_GNU_RELRO segment in accordance with Linux Standard Base spec
chapter 12:

> PT_GNU_RELRO
> The array element specifies the location and size of a segment which may
> be made *read-only* after relocations have been processed.

Perform a readelf-style mapping check between this segment and sections,
set `IsRelro` section attribute.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D152944
2023-06-20 20:44:18 -07:00
Kazu Hirata
e7541f561d [BOLT] Use llvm::is_contained (NFC) 2023-06-18 11:53:01 -07:00
Job Noorman
f873029386 [BOLT] Add minimal RISC-V 64-bit support
Just enough features are implemented to process a simple "hello world"
executable and produce something that still runs (including libc calls).
This was mainly a matter of implementing support for various
relocations. Currently, the following are handled:

- R_RISCV_JAL
- R_RISCV_CALL
- R_RISCV_CALL_PLT
- R_RISCV_BRANCH
- R_RISCV_RVC_BRANCH
- R_RISCV_RVC_JUMP
- R_RISCV_GOT_HI20
- R_RISCV_PCREL_HI20
- R_RISCV_PCREL_LO12_I
- R_RISCV_RELAX
- R_RISCV_NONE

Executables linked with linker relaxation will probably fail to be
processed. BOLT relocates .text to a high address while leaving .plt at
its original (low) address. This causes PC-relative PLT calls that were
relaxed to a JAL to not fit their offset in an I-immediate anymore. This
is something that will be addressed in a later patch.

Changes to the BOLT core are relatively minor. Two things were tricky to
implement and needed slightly larger changes. I'll explain those below.

The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a
AUIPC/JALR pair, the second does not get any relocation (unlike other
PCREL pairs). This causes issues with the combinations of the way BOLT
processes binaries and the RISC-V MC-layer handles relocations:
- BOLT reassembles instructions one by one and since the JALR doesn't
  have a relocation, it simply gets copied without modification;
- Even though the MC-layer handles R_RISCV_CALL properly (adjusts both
  the AUIPC and the JALR), it assumes the immediates of both
  instructions are 0 (to be able to or-in a new value). This will most
  likely not be the case for the JALR that got copied over.

To handle this difficulty without resorting to RISC-V-specific hacks in
the BOLT core, a new binary pass was added that searches for
AUIPC/JALR pairs and zeroes-out the immediate of the JALR.

A second difficulty was supporting ABS symbols. As far as I can tell,
ABS symbols were not handled at all, causing __global_pointer$ to break.
RewriteInstance::analyzeRelocation was updated to handle these
generically.

Tests are provided for all supported relocations. Note that in order to
test the correct handling of PLT entries, an ELF file produced by GCC
had to be used. While I tried to strip the YAML representation, it's
still quite large. Any suggestions on how to improve this would be
appreciated.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D145687
2023-06-16 12:19:36 +02:00
Job Noorman
05634f7346 [BOLT] Move from RuntimeDyld to JITLink
RuntimeDyld has been deprecated in favor of JITLink. [1] This patch
replaces all uses of RuntimeDyld in BOLT with JITLink.

Care has been taken to minimize the impact on the code structure in
order to ease the inspection of this (rather large) changeset. Since
BOLT relied on the RuntimeDyld API in multiple places, this wasn't
always possible though and I'll explain the changes in code structure
first.

Design note: BOLT uses a JIT linker to perform what essentially is
static linking. No linked code is ever executed; the result of linking
is simply written back to an executable file. For this reason, I
restricted myself to the use of the core JITLink library and avoided ORC
as much as possible.

RuntimeDyld contains methods for loading objects (loadObject) and symbol
lookup (getSymbol). Since JITLink doesn't provide a class with a similar
interface, the BOLTLinker abstract class was added to implement it. It
was added to Core since both the Rewrite and RuntimeLibs libraries make
use of it. Wherever a RuntimeDyld object was used before, it was
replaced with a BOLTLinker object.

There is one major difference between the RuntimeDyld and BOLTLinker
interfaces: in JITLink, section allocation and the application of fixups
(relocation) happens in a single call (jitlink::link). That is, there is
no separate method like finalizeWithMemoryManagerLocking in RuntimeDyld.
BOLT used to remap sections between allocating (loadObject) and linking
them (finalizeWithMemoryManagerLocking). This doesn't work anymore with
JITLink. Instead, BOLTLinker::loadObject accepts a callback that is
called before fixups are applied which is used to remap sections.

The actual implementation of the BOLTLinker interface lives in the
JITLinkLinker class in the Rewrite library. It's the only part of the
BOLT code that should directly interact with the JITLink API.

For loading object, JITLinkLinker first creates a LinkGraph
(jitlink::createLinkGraphFromObject) and then links it (jitlink::link).
For the latter, it uses a custom JITLinkContext with the following
properties:
- Use BOLT's ExecutableFileMemoryManager. This one was updated to
  implement the JITLinkMemoryManager interface. Since BOLT never
  executes code, its finalization step is a no-op.
- Pass config: don't use the default target passes since they modify
  DWARF sections in a way that seems incompatible with BOLT. Also run a
  custom pre-prune pass that makes sure sections without symbols are not
  pruned by JITLink.
- Implement symbol lookup. This used to be implemented by
  BOLTSymbolResolver.
- Call the section mapper callback before the final linking step.
- Copy symbol values when the LinkGraph is resolved. Symbols are stored
  inside JITLinkLinker to ensure that later objects (i.e.,
  instrumentation libraries) can find them. This functionality used to
  be provided by RuntimeDyld but I did not find a way to use JITLink
  directly for this.

Some more minor points of interest:
- BinarySection::SectionID: JITLink doesn't have something equivalent to
  RuntimeDyld's Section IDs. Instead, sections can only be referred to
  by name. Hence, SectionID was updated to a string.
- There seem to be no tests for Mach-O. I've tested a small hello-world
  style binary but not more than that.
- On Mach-O, JITLink "normalizes" section names to include the segment
  name. I had to parse the section name back from this manually which
  feels slightly hacky.

[1] https://reviews.llvm.org/D145686#4222642

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D147544
2023-06-15 11:13:52 +02:00
Maksim Panchenko
1ebad216ef [BOLT][NFCI] Remove redundant instance of MCAsmBackend
Use instance of MCAsmBackend from BinaryContext instead of creating a
new one.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D152849
2023-06-13 13:14:05 -07:00
Maksim Panchenko
c4e60a7f60 [BOLT] Fix --max-funcs=<N> option
Fix off-by-one error while handling of the --max-funcs=<N> option.
We used to process N+1 functions when N was requested.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D152751
2023-06-12 16:54:14 -07:00
Christian Ulmann
f5425c128a [LoopInfo] Move generic LoopInfo into own files
This commit splits the generic part of `LoopInfo` into separate files.
These new `GenericLoopInfo` files are located in `llvm/Support` to be inline
with `GenericDomTree`.

Furthermore, this change ensures that MLIR's Bazel build does not have
to link against `LLVMAnalysis` just to use these template headers.

Depends on D148219

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D148235
2023-04-24 06:07:05 +00:00
Nathan Sidwell
5b9f0309d6 [BOLT] Remove unsupported ELF type reloc handling
Drop unsupported ELF format reloc handling -- RewriteInstance lacks
this flexibility elsewhere.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D148946
2023-04-23 13:09:37 -04:00
Nathan Sidwell
ffb42e313d [BOLT] Remove unneeded dyncasts
These checks are unnecessary -- we've already bailed if the format was wrong.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D148848
2023-04-21 13:40:54 -04:00
Nathan Sidwell
9c92b023da [BOLT][NFC] Move phdr typedef to cpp file
This typedef is only used inside the RewriteInstance source file, let's not
expose it in the header file -- even if private.

Differential Revision: https://reviews.llvm.org/D148667
2023-04-19 15:51:17 -04:00
Nathan Sidwell
f2f0411924 [BOLT] Adjust Shdr alignment
Shdr's are not necesarily size 2^n, and there is no reason to align to
that boundary if they are.

Differential Revision: https://reviews.llvm.org/D148666
2023-04-19 15:51:12 -04:00
Job Noorman
48ad4296f7 [BOLT] Fix use-after-free in RewriteInstance::mapCodeSections
When a cold function is too large, its section gets deregistered.
However, the section is still dereferenced later to get its RuntimeDyld
ID. This patch moves the deregistration to after the last dereference.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D148427
2023-04-17 16:16:49 +02:00
Job Noorman
54ab954149 [BOLT] Reject symbols pointing to section end
Sometimes, symbols are present that point to the end of a section (i.e.,
one-past the highest valid address). Currently, BOLT either rejects
those symbols when they don't point to another existing section, or errs
when they do and the other section is not executable. I suppose BOLT
would accept the symbol when it points to an executable section.

In any case, these symbols should not be considered while discovering
functions and should not result in an error. This patch implements that.

Note that this patch checks explicitly for symbols whose value equals
the end of their section. It might make more sense to verify that the
symbol's value is within [section start, section end). However, I'm not
sure if this could every happen *and* its value does not equal the end.

Another way to implement this is to verify that the BinarySection we
find at the symbol's address actually corresponds to the symbol's
section. I'm not sure what the best approach is so feedback is welcome.

Reviewed By: yota9, rafauler

Differential Revision: https://reviews.llvm.org/D146215
2023-03-21 13:59:39 +04:00
Vladislav Khmelevsky
f9bf9f925e [BOLT] Add .relr.dyn section support
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D146085
2023-03-17 17:24:19 +04:00
Kazu Hirata
4e585e51c1 Use *{Map,Set}::contains (NFC) 2023-03-15 22:55:35 -07:00
Vladislav Khmelevsky
207ea5f2e4 [BOLT] Add writable segment for allocatable sections
The golang support creates 2 new data segments, one of them contains
relocations in PIC binaries, so the section must have writable rights.
Currently BOLT creates only one new segment that contains new sections
with RX rights, now also create RW segment if there are any new writable
sections were allocated during BOLT binary processing.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D143390
2023-03-15 00:06:55 +04:00
Vladislav Khmelevsky
7117af529e [BOLT] Improve dynamic relocations support for CI
This patch fixes few problems with supporting dynamic relocations in CI.
1. After dynamic relocations and functions were read search for dynamic
relocations located in functions. Currently we expected them only to be
relative and only to be in constant island. Mark islands of such
functions to have dynamic relocations and create CI access symbol on the
relocation offset, so the BD would be created for such place.
2. During function disassemble and handling address reference for
constant island check if the referred external CI has dynamic
relocation. And if it has one we would continue to refer original CI
rather then creating a local copy.
3. After function disassembly stage mark function that has dynamic reloc
in CI as non-simple. We don't want such functions to be optimized, since
such passes as split function would create 2 copies of CI which we
unable to support currently.
4. During updating output values for BF search for BD located in CI and
update their output locations.
5. On dynamic relocation patching stage search for binary data located
on relocation offset. If it was moved use new relocation offset value
rather then an old one.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D143748
2023-03-13 13:37:28 +04:00
Amir Ayupov
c49941bd0d [BOLT] Process fragment siblings in lite mode, keep lite mode on
In lite mode, include split function fragments to the list of functions to
process even if a fragment has no samples. This is required to properly
detect and update split jump tables (jump tables that contain pointers to code
in the main and cold fragments).

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D140457
2023-02-08 19:11:27 -08:00
yavtuk
0776fc32b1 [BOLT] Search section based on relocation symbol
We need to search referenced section based on relocations symbol section
to properly match end section symbols. For example on some binaries we
can observe that init_array_end/fini_array_end might be "placed" in to
the gap and since no section could be found for address the relocation
would be skipped resulting in wrong ADRP imm after emitting new text
resulting in binary sigsegv.

Credits for the test to Vladislav Khmelevskii aka yota9.
2023-02-08 00:15:56 +03:00
Amir Ayupov
c8482da779 [BOLT] Reintroduce allow-stripped
Reject stripped binaries as a policy.

The core issue with stripped binaries is that we can't detect the presence
of split functions which require extra handling. Therefore BOLT can't ensure
functional correctness of produced binary if the input stripped binary contains
split functions. Supporting such cases is an interesting problem but it goes
against BOLT's intended goal of achieving peak program performance.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D142686
2023-02-06 18:08:13 -08:00
Amir Ayupov
16492a6143 [BOLT][NFC] Rename {MachO,}RewriteInstance::create methods
Follow the code style of fallible constructors in [LLVM Programmer's Manual]
(https://llvm.org/docs/ProgrammersManual.html#fallible-constructors)
and rename `RewriteInstance::createRewriteInstance` to `RewriteInstance::create`

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D143119
2023-02-02 12:30:45 -08:00
Amir Ayupov
72e5b14fe7 [BOLT][NFC] Use llvm::make_second_range
Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D143019
2023-02-02 12:02:31 -08:00
Amir Ayupov
287508cd9c [BOLT] Use LTO fuzzy name matching in function-order
Allow partial name matching wrt LTO suffixes in `function-order`
user-supplied function list, the same as permitted by profile matching.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D142269
2023-01-25 11:43:10 -08:00
Amir Ayupov
69a9bbf106 [BOLT][NFC] Replace ambiguous BinarySection::isReadOnly with isWritable
Address feedback in https://reviews.llvm.org/D102284#2755060

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D141733
2023-01-18 14:53:07 -08:00
Amir Ayupov
43f382a9f4 [BOLT][NFC] Simplify handleRelocation
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D132089
2023-01-18 14:19:35 -08:00
Kazu Hirata
e8d6c537ac [BOLT] Use std::optional instead of llvm::Optional (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-02 18:40:21 -08:00
Amir Ayupov
703d94d8f0 [BOLT] Respect -function-order in lite mode
Process functions listed in -function-order file even in lite mode.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D140435
2022-12-28 20:50:20 -08:00
Vladislav Khmelevsky
17ed8f2928 [BOLT][AArch64] Handle adrp+ld64 linker relaxations
Linker might relax adrp + ldr got address loading to adrp + add for
local non-preemptible symbols (e.g. hidden/protected symbols in
executable). As usually linker doesn't change relocations properly after
relaxation, so we have to handle such cases by ourselves. To do that
during relocations reading we change LD64 reloc to ADD if instruction
mismatch found and introduce FixRelaxationPass that searches for ADRP+ADD
pairs and after performing some checks we're replacing ADRP target symbol
to already fixed ADDs one.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D138097
2022-12-23 01:20:18 +04:00
Maksim Panchenko
be9d3edee8 [BOLT][NFC] Remove unused PrintInstructions argument
PrintInstructions was unused in BinaryFunction::print() and dump().

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D140440
2022-12-20 15:57:13 -08:00
Amir Ayupov
72528ee4b4 [BOLT][NFC] Use std::optional in has*NameRegex 2022-12-11 22:13:47 -08:00
Amir Ayupov
6e5b4dacf3 [BOLT][NFC] Use std::optional in RI 2022-12-11 22:13:46 -08:00
Kazu Hirata
e324a80fab [BOLT] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 23:12:38 -08:00
Kazu Hirata
1fa870b1bd Use None consistently (NFC)
This patch replaces NoneType() and NoneType::None with None in
preparation for migration from llvm::Optional to std::optional.

In the std::optional world, we are not guranteed to be able to
default-construct std::nullopt_t or peek what's inside it, so neither
NoneType() nor NoneType::None has a corresponding expression in the
std::optional world.

Once we consistently use None, we should even be able to replace the
contents of llvm/include/llvm/ADT/None.h with something like:

  using NoneType = std::nullopt_t;
  inline constexpr std::nullopt_t None = std::nullopt;

to ease the migration from llvm::Optional to std::optional.

Differential Revision: https://reviews.llvm.org/D138376
2022-11-20 00:24:40 -08:00
Alexey Moksyakov
1fb186198a adds huge pages support of PIE/no-PIE binaries
This patch adds the huge pages support (-hugify) for PIE/no-PIE
binaries. Also returned functionality to support the kernels < 5.10
where there is a problem in a dynamic loader with the alignment of
pages addresses.

Differential Revision: https://reviews.llvm.org/D129107
2022-11-04 15:14:21 +03:00
Hongtao Yu
d5a963ab8b [PseudoProbe] Replace relocation with offset for entry probe.
Currently pseudo probe encoding for a function is like:
	- For the first probe, a relocation from it to its physical position in the code body
	- For subsequent probes, an incremental offset from the current probe to the previous probe

The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address.

A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name.

Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function

```
Foo() {
  …
  Probe 1
  …
  Probe 2
}
```

If it is transformed into two binary functions:

```
Foo:
   …

Foo.outlined:
   …
```

The encoding for the two binary functions will be separate:

```

GUID of Foo
  Probe 1

GUID of Foo
  Sentinel probe of Foo.outlined
  Probe 2
```

Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address.

On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues.

The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding.

Reviewed By: wenlei, maksfb

Differential Revision: https://reviews.llvm.org/D135912
Differential Revision: https://reviews.llvm.org/D135914
Differential Revision: https://reviews.llvm.org/D136394
2022-10-27 13:28:22 -07:00
Maksim Panchenko
20204db503 [BOLT] Add mold-style PLT support
mold linker creates symbols for PLT entries and that caught BOLT by
surprise. Add the support for marked PLT entries.

Fixes: #58498

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D136655
2022-10-25 11:03:52 -07:00
Rafael Auler
c0d954a068 [BOLT] Ignore duplicate global symbols
We noticed some binaries with duplicated global symbol
entries (same name, address and size). Ignore them as it is possibly a
bug in the linker, and continue processing, unless the symbol has a
different size or address.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D136122
2022-10-19 11:52:06 -07:00
Maksim Panchenko
28d70d3f1e [BOLT][NFC] Refactor EFMM initialization
Move EFMM initialization code to emitAndLink(), where EFMM is used.

Reviewed By: yavtuk

Differential Revision: https://reviews.llvm.org/D136205
2022-10-18 20:31:10 -07:00
Maksim Panchenko
dc8035bddd [BOLT][NFCI] Avoid calling registerName() twice
Calling registerName() for the same symbol twice, even with a different
size, has no effect other than the lookup overhead. Avoid the
redundancy.

Fixes facebookincubator/BOLT#299

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D136115
2022-10-17 16:16:31 -07:00
Maksim Panchenko
4d3a0cade2 [BOLT] Section-handling refactoring/overhaul
Simplify the logic of handling sections in BOLT. This change brings more
direct and predictable mapping of BinarySection instances to sections in
the input and output files.

* Only sections from the input binary will have a non-null SectionRef.
  When a new section is created as a copy of the input section,
  its SectionRef is reset to null.

* RewriteInstance::getOutputSectionName() is removed as the section name
  in the output file is now defined by BinarySection::getOutputName().

* Querying BinaryContext for sections by name uses their original name.
  E.g., getUniqueSectionByName(".rodata") will return the original
  section even if the new .rodata section was created.

* Input file sections (with relocations applied) are emitted via MC with
  ".bolt.org" prefix. However, their name in the output binary is
  unchanged unless a new section with the same name is created.

* New sections are emitted internally with ".bolt.new" prefix if there's
  a name conflict with an input file section. Their original name is
  preserved in the output file.

* Section header string table is properly populated with section names
  that are actually used. Previously we used to include discarded
  section names as well.

* Fix the problem when dynamic relocations were propagated to a new
  section with a name that matched a section in the input binary.
  E.g., the new .rodata with jump tables had dynamic relocations from
  the original .rodata.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D135494
2022-10-13 23:10:39 -07:00
Maksim Panchenko
0b213c9090 [BOLT] Fix writing out unmarked .eh_frame section
When BOLT updates .eh_frame section, it concatenates newly-generated
contents (from CFI directives) with the original .eh_frame that has
relocations applied to it. However, if no new content is generated,
the original .eh_frame has to be left intact. In that case, BOLT was
still writing out the relocatable copy of the original .eh_frame section
to the new segment, even though this copy was never used and was not
even marked in the section header table.

Detect the scenario above and skip allocating extra space for .eh_frame.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D135223
2022-10-07 11:19:51 -07:00
Maksim Panchenko
c683e281cd [BOLT] Properly set _end symbol
To properly set the "_end" symbol, we need to track the last allocatable
address. Simply emitting "_end" at the end of some section is not
sufficient since the order of section allocation is unknown during the
emission step.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D135121
2022-10-07 11:19:14 -07:00
Maksim Panchenko
3e097fab5a [BOLT][NFC] Remove text section assertion
We can emit a binary without a new text section. Hence, the text section
assertion is not needed.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D135120
2022-10-07 11:18:37 -07:00
Huan Nguyen
153eeb4a5e [BOLT] Disable -lite when split function is present
In lite mode, BOLT only transforms a subset of functions, leave the
remaining functions intact.

For NoPIC, it is fine. BOLT can scan relocations and fix-up all refs
that point to any function body in the subset.

For no-split function PIC, it is fine. Since jump tables are intra-
procedural transfer, BOLT can find both the jump table base and the
target within same function. Thus, BOLT can update and/or move jump
tables.

However, it is wrong to process a subset of functions in split function
PIC. This is because BOLT does not know if functions in the subset are
isolated, i.e., cannot be accessed by functions out of the subset,
especially via split jump table.

For example, BOLT only process three functions A, B and C. Suppose that
A is reached via jump table from A.cold, which is not processed. When
A is moved (due to optimization), the jump table in A.cold is invalid.
We cannot fix-up this jump table since it is only recognized in A.cold,
which BOLT does not process.

Solution: Disable lite mode if split function is present.

Future improvement: In lite mode, if split function is found, BOLT
processes both functions in the subset and all of their sibling
fragments.

Test Plan:
```
ninja check-bolt
```

Reviewed By: Amir, maksfb

Differential Revision: https://reviews.llvm.org/D131283
2022-09-28 19:26:17 +02:00
Amir Ayupov
39336fc09c [BOLT] Control aggregation mode output profile file format
In perf2bolt and `-aggregate-only` BOLT mode, the output profile file is written
in fdata format by default. Provide a knob `-profile-format=[fdata,yaml]` to
control the format.
Note that `-w` option still dumps in YAML format.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D133995
2022-09-19 13:37:10 -07:00