181 Commits

Author SHA1 Message Date
Micah Weston
a3f61c8bfd
[SHT_LLVM_BB_ADDR_MAP][obj2yaml] Implements PGOAnalysisMap for elf2yaml and tests. (#80924)
Adds support to obj2yaml for PGO Analysis Map. Adds a test to both
obj2yaml and yaml2obj.
2024-02-13 21:53:05 -05:00
Rahman Lavaee
acec6419e8
[SHT_LLVM_BB_ADDR_MAP] Allow basic-block-sections and labels be used together by decoupling the handling of the two features. (#74128)
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
    
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.

First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.

| | |
|--|--|
| Address of the function | Function Address |
|  Number of basic blocks in this function | NumBlocks |
|  BB entry 1
|  BB entry 2
|   ...
|  BB entry #NumBlocks
    
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
    
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
|  Number of sections for this function   | NumBBRanges |
| Section 1 begin address                     | BaseAddress[1]  |
| Number of basic blocks in section 1 | NumBlocks[1]    |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
    
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
    
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
    
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.

We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
  - New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
2024-02-01 17:50:46 -08:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00
Kazu Hirata
ac4a272913 [llvm] Stop including llvm/ADT/DenseSet.h (NFC)
Identified with clangd.
2023-11-11 09:48:29 -08:00
Job Noorman
8545093715 [obj2yaml] Emit ProgramHeader.Offset
Currently, obj2yaml doesn't emit the offset of program headers, leaving
it to yaml2obj to calculate offsets based on `FirstSec` and `LastSec`.
This causes an obj2yaml->yaml2obj round trip to often produce an ELF
file that is not equivalent to the original, especially since it seems
common to have program headers at offset 0 whose first section starts at
a higher address. Besides being non-equivalent, the produced ELF files
also do not seem to work propery and readelf complains about them.

Taking a simple hello world program in C, compiled using either GCC or
Clang, the original ELF file has the following program headers (only
showing some relevant ones):

```
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002d8 0x00000000000002d8  R      0x8
  INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000630 0x0000000000000630  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x0000000000000161 0x0000000000000161  R E    0x1000
...

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   03     .init .plt .text .fini
...
```

While this is the result of an obj2yaml->yaml2obj round trip:

```
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000000 0x0000000000000040 0x0000000000000040
                 0x0000000000000000 0x0000000000000000  R      0x8
readelf: Error: the PHDR segment is not covered by a LOAD segment
  INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000318 0x0000000000000000 0x0000000000000000
                 0x0000000000000318 0x0000000000000318  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x0000000000000161 0x0000000000000161  R E    0x1000
...

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02
   03     .init .plt .text .fini
...
```

Note that the offset of segment 2 changed from 0x0 to 0x318. This has
two effects:
- readelf complains "Error: the PHDR segment is not covered by a LOAD
  segment" since PHDR was originally covered by segment 2 but not
  anymore;
- Segment 2 effectively became empty according to the section to segment
  mapping.

I addition to these, the output doesn't correctly execute anymore,
crashing with a "SIGSEGV (Address boundary error)".

This patch fixes the difference in program header layout after a round
trip by explicitly emitting offsets.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D145555
2023-09-25 10:21:53 +02:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Fangrui Song
689715f335 [Object] Fix handling of Elf_Nhdr with sh_addralign=8
The generic ABI says:

> Padding is present, if necessary, to ensure 8 or 4-byte alignment for the next note entry (depending on whether the file is a 64-bit or 32-bit object). Such padding is not included in descsz.

Our parsing code currently aligns n_namesz. Fix the bug by aligning the start
offset of the descriptor instead. This issue has been benign because the primary
uses of sh_addralign=8 notes are `.note.gnu.property`, where
`sizeof(Elf_Nhdr) + sizeof("GNU") = 16` (already aligned by 8).

In practice, many 64-bit systems incorrectly use sh_addralign=4 notes.
We can use sh_addralign (= p_align) to decide the descriptor padding.
Treat an alignment of 0 and 1 as 4. This approach matches modern GNU readelf
(since 2018).

We have a few tests incorrectly using sh_addralign=0. We may make our behavior
stricter after fixing these tests.

Linux kernel dumped core files use `p_align=0` notes, so we need to support the
case for compatibility.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D150022
2023-05-10 09:36:58 -07:00
Rahman Lavaee
3d6841b2b1 [Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number.
Let Propeller use specialized IDs for basic blocks, instead of MBB number.

This allows optimizations not just prior to asm-printer, but throughout the entire codegen.
This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version.

####Background
Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR.  This is done as follows.
    - Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly.
    - Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to.
    - While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization.  Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point.
    - The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks).
    - In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR.  Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped.
    - Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline.  Hence, MBB numbers are not suitable and we need something else.
####Solution
We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block.  It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs.

 To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies.

The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions.

####Impact on Size of the `LLVM_BB_ADDR_MAP` Section
Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D100808
2023-01-17 15:25:29 -08:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Gregory Alfonso
d22f050e15 Remove redundant .c_str() and .get() calls
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D139485
2022-12-18 00:33:53 +00:00
Rahman Lavaee
96b6ee1bdc Revert "[Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number."
This reverts commit 6015a045d768feab3bae9ad9c0c81e118df8b04a.

Differential Revision: https://reviews.llvm.org/D139952
2022-12-13 11:13:57 -08:00
Rahman Lavaee
6015a045d7 [Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number.
Let Propeller use specialized IDs for basic blocks, instead of MBB number.

This allows optimizations not just prior to asm-printer, but throughout the entire codegen.
This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version.

####Background
Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR.  This is done as follows.
    - Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly.
    - Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to.
    - While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization.  Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point.
    - The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks).
    - In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR.  Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped.
    - Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline.  Hence, MBB numbers are not suitable and we need something else.
####Solution
We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block.  It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs.

 To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies.

The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions.

####Impact on Size of the `LLVM_BB_ADDR_MAP` Section
Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D100808
2022-12-06 22:50:09 -08:00
Krzysztof Parzyszek
c589730ad5 [YAML] Convert Optional to std::optional 2022-12-06 12:49:32 -08:00
Kazu Hirata
b4482f7ca0 [tools] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:40 -08:00
Kazu Hirata
23a884bbef [obj2yaml] Use std::optional in elf2yaml.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 19:06:49 -08:00
Rahman Lavaee
0aa6df6575 [Propeller] Encode address offsets of basic blocks relative to the end of the previous basic blocks.
This is a resurrection of D106421 with the change that it keeps backward-compatibility. This means decoding the previous version of `LLVM_BB_ADDR_MAP` will work. This is required as the profile mapping tool is not released with LLVM (AutoFDO). As suggested by @jhenderson we rename the original  section type value to `SHT_LLVM_BB_ADDR_MAP_V0` and assign a new value to the `SHT_LLVM_BB_ADDR_MAP` section type. The new encoding adds a version byte to each function entry to specify the encoding version for that function.  This patch also adds a feature byte to be used with more flexibility in the future. An use-case example for the feature field is encoding multi-section functions more concisely using a different format.

Conceptually, the new encoding emits basic block offsets and sizes as label differences between each two consecutive basic block begin and end label. When decoding, offsets must be aggregated along with basic block sizes to calculate the final offsets of basic blocks relative to the function address.

This encoding uses smaller values compared to the existing one (offsets relative to function symbol).
Smaller values tend to occupy fewer bytes in ULEB128 encoding. As a result, we get about 17% total reduction in the size of the bb-address-map section (from about 11MB to 9MB for the clang PGO binary).
The extra two bytes (version and feature fields) incur a small 3% size overhead to the `LLVM_BB_ADDR_MAP` section size.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D121346
2022-06-28 07:42:54 -07:00
Kazu Hirata
129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
serge-sans-paille
290e482342 Cleanup LLVMDWARFDebugInfo
As usual with that header cleanup series, some implicit dependencies now need to
be explicit:

llvm/DebugInfo/DWARF/DWARFContext.h no longer includes:
- "llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h"
- "llvm/DebugInfo/DWARF/DWARFCompileUnit.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
- "llvm/DebugInfo/DWARF/DWARFDebugAranges.h"
- "llvm/DebugInfo/DWARF/DWARFDebugFrame.h"
- "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
- "llvm/DebugInfo/DWARF/DWARFDebugMacro.h"
- "llvm/DebugInfo/DWARF/DWARFGdbIndex.h"
- "llvm/DebugInfo/DWARF/DWARFSection.h"
- "llvm/DebugInfo/DWARF/DWARFTypeUnit.h"
- "llvm/DebugInfo/DWARF/DWARFUnitIndex.h"

Plus llvm/Support/Errc.h not included by a bunch of llvm/DebugInfo/DWARF/DWARF*.h files

Preprocessed lines to build llvm on my setup:
after: 1065629059
before: 1066621848

Which is a great diff!

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119723
2022-02-15 09:16:03 +01:00
Marcelo Juchem
dfb213c2df Fix ambiguous overload build failure
LLVM (llvmorg-14-init) under Debian sid using latest gcc (Debian
10.3.0-9) 10.3.0 fails due to ambiguous overload on operators == and !=:

/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:212:22:
error: ambiguous overload for 'operator!='
(operand types are 'llvm::ELFYAML::ELF_SHF' and 'int')

/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:204:32:
error: ambiguous overload for 'operator!='
(operand types are 'const llvm::yaml::Hex64' and 'int')

/root/src/llvm/src/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp:629:35:
error: ambiguous overload for 'operator=='
(operand types are 'const uint64_t' {aka 'const long unsigned int'} and
'llvm::Register')

Reviewed by: StephenTozer, jmorse, Higuoxing

Differential Revision: https://reviews.llvm.org/D109534
2021-10-01 14:19:57 +01:00
Alexander Yermolovich
a224c5199b [LLD][LLVM] CG Graph profile using relocations
Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not. In non-visible behavior indices point to wrong symbols. The visible behavior indices point outside of Symbol table: "invalid symbol index".

This patch changes the format by using R_*_NONE relocations to indicate the from/to symbols. The Frequency (Weight) will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.

With this approach post processing tools that handle relocations correctly work for this section also. Tools can add/remove symbols and as long as they handle relocation sections with this approach information stays correct.

Doing a quick experiment with clang-13.
The size went up from 107KB to 322KB, aggregate of all the input sections. Size of clang-13 binary is ~118MB. For users of -fprofile-use/-fprofile-sample-use the size of object files will go up slightly, it will not impact final binary size.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D104080
2021-06-24 09:09:33 -07:00
James Henderson
b9ce8ea454 [obj2yaml] Address D104035 review comments
Accidentally missed from commit 5c1639fe064b.

Differential Revision: https://reviews.llvm.org/D104035
2021-06-16 15:01:54 +01:00
James Henderson
5c1639fe06 [yaml2obj][obj2yaml] Support custom ELF section header string table name
This patch adds support for a new field in the FileHeader, which states
the name to use for the section header string table. This also allows
combining the string table with another string table in the object, e.g.
the symbol name string table. The field is optional. By default,
.shstrtab will continue to be used.

This partially fixes https://bugs.llvm.org/show_bug.cgi?id=50506.

Reviewed by: Higuoxing

Differential Revision: https://reviews.llvm.org/D104035
2021-06-16 10:02:23 +01:00
Rahman Lavaee
9f52708660 [obj2yaml,yaml2obj] Add NumBlocks to the BBAddrMapEntry yaml field.
As discussed in D95511, this allows us to encode invalid BBAddrMap
sections to be used in more rigorous testing.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D96831
2021-02-22 18:08:26 -08:00
Rahman Lavaee
0252e6ead1 [obj2yaml,yaml2obj] Add NumBlocks to the BBAddrMapEntry yaml field.
As discussed in D95511, this allows us to encode invalid BBAddrMap
sections to be used in more rigorous testing.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D96831
2021-02-17 15:45:13 -08:00
Alex Richardson
d613d8eb0e [yaml2obj] Handle NT_* string values in for ELF note types
This is required for D74393.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D95953
2021-02-09 16:59:22 +00:00
Rahman Lavaee
f1ff6d210a [obj2yaml, yaml2obj] Use Hex64 for BBAddressMap fields.
This patch let the yaml encoding use Hex64 values for NumBlocks, BB AddressOffset, BB Size, and BB Metadata.
Additionally, it changes the decoded values in elf2yaml to uint64_t to match DataExtractor::getULEB128 return type.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D95767
2021-02-01 15:37:30 -08:00
Georgii Rymar
9c89dcf807 [yaml2obj, obj2yaml] - Implement section header table as a special Chunk.
This was discussed in D93678 thread.
Currently we have one special chunk - Fill.

This patch re implements the "SectionHeaderTable" key to become a special chunk too.
With that we are able to place the section header table at any location,
just like we place sections.

Differential revision: https://reviews.llvm.org/D95140
2021-01-25 13:08:08 +03:00
Georgii Rymar
51f4958057 [yaml2obj/obj2yaml] - Improve dumping/creating of ELF versioning sections.
This makes the following improvements.

For `SHT_GNU_versym`:
 * yaml2obj: set `sh_link` to index of `.dynsym` section automatically.
For `SHT_GNU_verdef`:
 * yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
 * yaml2obj: set `sh_info` field automatically.
 * obj2yaml: don't dump the `Info` field when its value matches the number of version definitions.
For `SHT_GNU_verneed`:
 * yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
 * yaml2obj: set `sh_info` field automatically.
 * obj2yaml: don't dump the `Info` field when its value matches the number of version dependencies.

Also, simplifies few test cases.

Differential revision: https://reviews.llvm.org/D94956
2021-01-21 10:36:48 +03:00
Georgii Rymar
d9afe8588e [yaml2obj/obj2yaml] - Refine handling of SHT_GNU_verdef sections.
This patch:
1) Makes `Version`, `Flags`, `VersionNdx` and `Hash` fields to be `Optional<>`.
2) Disallows dumping version definitions that have `vd_version != 1`.
   `vd_version` identifies the version of the structure itself.
   (https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html,
    https://docs.oracle.com/cd/E19683-01/816-7777/chapter6-80869/index.html)
3) Stops dumping default values for `Version`, `Flags`, `VersionNdx` and `Hash` fields.
4) Refines testing.

Differential revision: https://reviews.llvm.org/D94659
2021-01-15 12:40:42 +03:00
Kazu Hirata
4c1617dac8 [llvm] Use std::any_of (NFC) 2021-01-13 19:14:44 -08:00
Georgii Rymar
6d3098e7ff [obj2yaml,yaml2obj] - Refine how we set/dump the sh_entsize field.
This reuses the code from yaml2obj (moves it to ELFYAML.h).
With it we can set the `sh_entsize` in a single place in `obj2yaml`.

Note that it also fixes a bug of `yaml2obj`: we do not
set the `sh_entsize` field for the `SHT_ARM_EXIDX` section properly.

Differential revision: https://reviews.llvm.org/D93858
2021-01-13 11:52:40 +03:00
Georgii Rymar
c15a57cc1a [obj2yaml] - Don't crash when an object has an empty symbol table.
Currently we crash when we have an object with SHT_SYMTAB/SHT_DYNSYM sections
of size 0.

With this patch instead of the crash we start to dump them properly.

Differential revision: https://reviews.llvm.org/D93697
2021-01-12 14:08:59 +03:00
Georgii Rymar
60df7c08b1 [obj2yaml,yaml2obj] - Fix issues with creating/dumping group sections.
We have the following issues related to group sections:
1) yaml2obj is unable to set the custom `sh_entsize` value, because the `EntSize`
   key is currently ignored.
2) obj2yaml is unable to dump the group section which `sh_entsize != 4`.
3) obj2yaml always dumps the "EntSize" for group sections, though
   usually we are trying to omit dumping default values when dumping keys.
   I.e. we should not print the "EntSize" key when `sh_entsize` == 4.

This patch fixes (1),(3) and adds the test case to document the behavior of (2).

Differential revision: https://reviews.llvm.org/D93854
2021-01-12 14:07:42 +03:00
Georgii Rymar
c74751d4b5 [obj2yaml] - Fix the crash in getUniquedSectionName().
`getUniquedSectionName(const Elf_Shdr *Sec)` assumes that
`Sec` is not `nullptr`.

I've found one place in `getUniquedSymbolName` where it is
not true (because of that we crash when trying to dump
unnamed null section symbols).

Patch fixes the crash and changes the signature of the
`getUniquedSectionName` section to accept a reference.

Differential revision: https://reviews.llvm.org/D93754
2021-01-11 15:04:00 +03:00
Georgii Rymar
893c84d71c [obj2yaml] - Dump the content of a broken hash table properly.
This is similar to D93760.

When something is wrong with the hash table header we dump
its context as a raw data.

Currently we have the calculation overflow issue and it is possible to
bypass the validation we have (and crash).

The patch fixes it.

Differential revision: https://reviews.llvm.org/D93799
2020-12-25 11:51:28 +03:00
Georgii Rymar
438bc157a4 [libObject] - Add more ELF types to LLVM_ELF_IMPORT_TYPES_ELFT define (ELFTypes.h).
This allows to get rid of lots for typedefs/usings from many places.

Differential revision: https://reviews.llvm.org/D93801
2020-12-25 11:39:05 +03:00
Georgii Rymar
b8cb1802a8 [obj2yaml] - Dump the content of a broken GNU hash table properly.
When something is wrong with the GNU hash table header we dump
its context as a raw data.

Currently we have the calculation overflow issue and it is possible to
bypass the validation we have (and crash).

The patch fixes it.

Differential revision: https://reviews.llvm.org/D93760
2020-12-24 11:16:31 +03:00
Georgii Rymar
8c2cf89834 [yaml2obj/obj2yaml] - Make Value/Size fields of Symbol optional.
When a field is optional we can use the `=<none>` syntax in macros.
This patch makes `Value`/`Size` fields of `Symbol` optional
and adds test cases for them.

Differential revision: https://reviews.llvm.org/D93010
2020-12-16 13:49:57 +03:00
Georgii Rymar
abae3c1196 [obj2yaml] - Support dumping objects that have multiple SHT_SYMTAB_SHNDX sections.
It is allowed to have multiple `SHT_SYMTAB_SHNDX` sections, though
we currently don't implement it.

The current implementation assumes that there is a maximum of one SHT_SYMTAB_SHNDX
section and that it is always linked with .symtab section.

This patch drops this limitations.

Differential revision: https://reviews.llvm.org/D92644
2020-12-09 12:14:58 +03:00
Georgii Rymar
ffbce65f95 [lib/Object, tools] - Make ELFObjectFile::getELFFile return reference.
We always have an object, so we don't have to return a pointer.

Differential revision: https://reviews.llvm.org/D92560
2020-12-04 16:02:29 +03:00
Georgii Rymar
ea8c8a5097 [obj2yaml] - Teach tool to emit the "SectionHeaderTable" key and sort sections by file offset.
Currently when we dump sections, we dump them in the order,
which is specified in the sections header table.

With that the order in the output might not match the order in the file.
This patch starts sorting them by by file offsets when dumping.

When the order in the section header table doesn't match the order
in the file, we should emit the "SectionHeaderTable" key. This patch does it.

Differential revision: https://reviews.llvm.org/D91249
2020-12-01 12:59:15 +03:00
Georgii Rymar
ee9ffc7345 [obj2yaml] - Dump the EShNum key in some cases.
This patch starts emitting the `EShNum` key, when the `e_shnum = 0`
and the section header table exists.

`e_shnum` might be 0, when the the number of entries in the section header
table is larger than or equal to SHN_LORESERVE (0xff00).
In this case the real number of entries
in the section header table is held in the `sh_size`
member of the initial entry in section header table.

Currently, obj2yaml crashes when an object has `e_shoff != 0` and the `sh_size`
member of the initial entry in section header table is `0`.
This patch fixes it.

Differential revision: https://reviews.llvm.org/D92098
2020-11-27 15:56:10 +03:00
Georgii Rymar
c2090ff594 [obj2yaml] - Don't assert when trying to calculate the expected section offset.
The following line asserts when `sh_addralign > MAX_UINT32 && (uint32_t)sh_addralign == 0`:

```
    ExpectedOffset = alignTo(ExpectedOffset,
                             SecHdr.sh_addralign ? SecHdr.sh_addralign : 1);
```

it happens because `sh_addralign` is truncated to 32-bit value, but `alignTo`
doesn't accept `Align == 0`. We should change `1` to `1uLL`.

Differential revision: https://reviews.llvm.org/D92163
2020-11-27 15:38:22 +03:00
Georgii Rymar
5edb90c927 [obj2yaml] - Dump section offsets in some cases.
Currently we never dump the `sh_offset` key.
Though it sometimes an important information.

To reduce the noise this patch implements the following logic:
1) The "Offset" key for the first section is always emitted.
2) If we can derive the offset for a next section naturally,
   then the "Offset" key is omitted.

By "naturally" I mean that section[X] offset is expected to be:
```
offsetOf(section[X]) == alignTo(section[X - 1].sh_offset + section[X - 1].sh_size, section[X].sh_addralign)
```

So, when it has the expected value, we omit it from the output.

Differential revision: https://reviews.llvm.org/D91152
2020-11-25 12:41:01 +03:00
Georgii Rymar
a7a447be0f [yaml2obj] - ProgramHeaders: introduce FirstSec/LastSec instead of Sections list.
Imagine we have a YAML declaration of few sections: `foo1`, `<unnamed 2>`, `foo3`, `foo4`.

To put them into segment we can do (1*):

```
Sections:
 - Section: foo1
 - Section: foo4
```

or we can use (2*):

```
Sections:
 - Section: foo1
 - Section: foo3
 - Section: foo4
```

or (3*) :

```
Sections:
 - Section: foo1
## "(index 2)" here is a name that we automatically created for a unnamed section.
 - Section: (index 2)
 - Section: foo3
 - Section: foo4
```

It looks really confusing that we don't have to list all of sections.

At first I've tried to make this rule stricter and report an error when there is a gap
(i.e. when a section is included into segment, but not listed explicitly).
This did not work perfect, because such approach conflicts with unnamed sections/fills (see (3*)).

This patch drops "Sections" key and introduces 2 keys instead: `FirstSec` and `LastSec`.
Both are optional.

Differential revision: https://reviews.llvm.org/D90458
2020-11-09 13:00:50 +03:00
Rahman Lavaee
82e7c4ce45 [obj2yaml] [yaml2obj] Add yaml support for SHT_LLVM_BB_ADDR_MAP section.
YAML support allows us to better test the feature in the subsequent patches. The implementation is quite similar to the .stack_sizes section.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D88717
2020-11-06 12:44:42 -08:00
Georgii Rymar
2bfaf19516 [yaml2obj] - Make Section::Link field to be Optional<>.
`Link` is not an optional field currently.
Because of this it is not convenient to write macros.

This makes it optional and fixes corresponding test cases.

Differential revision: https://reviews.llvm.org/D90390
2020-10-30 16:18:53 +03:00
Georgii Rymar
400103f3d5 [yaml2obj/obj2yaml] - Add support of 'Size' and 'Content' keys for all sections.
Many sections either do not have a support of `Size`/`Content` or support just a
one of them, e.g only `Content`.

`Section` is the base class for sections. This patch adds `Content` and `Size` members
to it and removes similar members from derived classes. This allows to cleanup and
generalize the code and adds a support of these keys for all sections (`SHT_MIPS_ABIFLAGS`
is a only exception, it requires unrelated specific changes to be done).

I had to update/add many tests to test the new functionality properly.

Differential revision: https://reviews.llvm.org/D89039
2020-10-15 11:11:41 +03:00
Georgii Rymar
82311766d9 [obj2yaml] - Rename Group to GroupSection. NFC.
The `Group` class represents a group section and it is
named inconsistently with other sections which all has
the "Section" suffix. It is sometimes confusing,
this patch addresses the issue.

Differential revision: https://reviews.llvm.org/D88892
2020-10-07 17:04:15 +03:00
Georgii Rymar
5829dc9250 [yaml2obj][elf2yaml] - Add a support for the EntSize field for SHT_HASH sections.
Specification  for SHT_HASH table says (https://refspecs.linuxbase.org/elf/gabi4+/ch5.dynamic.html#hash)
that it contains Elf32_Word entries for both 32/64 bit objects.

Currently both GNU linkers and LLD sets the `sh_entsize` field to `4`.

At the same time, `yaml2obj` ignores the `EntSize` field for SHT_HASH sections.
This patch fixes this and also adds a support for obj2yaml: it will not
dump this field when the `sh_entsize` contains the default value (`4`).

Differential revision: https://reviews.llvm.org/D88652
2020-10-02 12:01:50 +03:00