Currently, obj2yaml doesn't emit the offset of program headers, leaving
it to yaml2obj to calculate offsets based on `FirstSec` and `LastSec`.
This causes an obj2yaml->yaml2obj round trip to often produce an ELF
file that is not equivalent to the original, especially since it seems
common to have program headers at offset 0 whose first section starts at
a higher address. Besides being non-equivalent, the produced ELF files
also do not seem to work propery and readelf complains about them.
Taking a simple hello world program in C, compiled using either GCC or
Clang, the original ELF file has the following program headers (only
showing some relevant ones):
```
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000630 0x0000000000000630 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000161 0x0000000000000161 R E 0x1000
...
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .text .fini
...
```
While this is the result of an obj2yaml->yaml2obj round trip:
```
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000000 0x0000000000000040 0x0000000000000040
0x0000000000000000 0x0000000000000000 R 0x8
readelf: Error: the PHDR segment is not covered by a LOAD segment
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000318 0x0000000000000000 0x0000000000000000
0x0000000000000318 0x0000000000000318 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000161 0x0000000000000161 R E 0x1000
...
Section to Segment mapping:
Segment Sections...
00
01 .interp
02
03 .init .plt .text .fini
...
```
Note that the offset of segment 2 changed from 0x0 to 0x318. This has
two effects:
- readelf complains "Error: the PHDR segment is not covered by a LOAD
segment" since PHDR was originally covered by segment 2 but not
anymore;
- Segment 2 effectively became empty according to the section to segment
mapping.
I addition to these, the output doesn't correctly execute anymore,
crashing with a "SIGSEGV (Address boundary error)".
This patch fixes the difference in program header layout after a round
trip by explicitly emitting offsets.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D145555
When we produce an YAML output, we also print leading zeroes currently.
An output might look like this:
```
- Name: .dynsym
Type: SHT_DYNSYM
Address: 0x0000000000001000
EntSize: 0x0000000000000018
```
There are probably no reason to print leading zeroes.
It just makes harder to read values. This patch stops printing them.
The output becomes like:
```
- Name: .dynsym
Type: SHT_DYNSYM
Address: 0x1000
EntSize: 0x18
```
This affects obj2yaml mostly, but also dsymutil and llvm-xray tools output.
Differential revision: https://reviews.llvm.org/D90930
Imagine we have a YAML declaration of few sections: `foo1`, `<unnamed 2>`, `foo3`, `foo4`.
To put them into segment we can do (1*):
```
Sections:
- Section: foo1
- Section: foo4
```
or we can use (2*):
```
Sections:
- Section: foo1
- Section: foo3
- Section: foo4
```
or (3*) :
```
Sections:
- Section: foo1
## "(index 2)" here is a name that we automatically created for a unnamed section.
- Section: (index 2)
- Section: foo3
- Section: foo4
```
It looks really confusing that we don't have to list all of sections.
At first I've tried to make this rule stricter and report an error when there is a gap
(i.e. when a section is included into segment, but not listed explicitly).
This did not work perfect, because such approach conflicts with unnamed sections/fills (see (3*)).
This patch drops "Sections" key and introduces 2 keys instead: `FirstSec` and `LastSec`.
Both are optional.
Differential revision: https://reviews.llvm.org/D90458
Currently we have to set 'Machine' to something in our
YAML descriptions. Usually we use 'EM_X86_64' for 64-bit targets
and 'EM_386' for 32-bit targets. At the same time, in fact, in most
cases our tests do not need a machine type and we can use
'EM_NONE'.
This is cleaner, because avoids the need of using a particular machine.
In this patch I've made the 'Machine' key optional (the default value,
when it is not specified is `EM_NONE`) and removed it (where possible)
from yaml2obj, obj2yaml and llvm-readobj tests.
There are few tests left where I decided not to remove it, because
I didn't want to touch CHECK lines or doing anything more complex
than a removing a "Machine: *" line and formatting lines around.
Differential revision: https://reviews.llvm.org/D86202
This teaches yaml2obj to allocate file space for a no-bits section
when there is a non-nobits section in the same segment that follows it.
It was discussed in D78005 thread and matches GNU linkers and LLD behavior.
Differential revision: https://reviews.llvm.org/D80629
For describing section/symbol names we can use unique suffixes,
e.g:
```
- Name: '.foo [1]`
- Name: '.foo [2]`
```
It can be a problem (see https://reviews.llvm.org/D79984#inline-734829),
because `[]` are sometimes used to describe a macros:
```
- Name: "[[a0]]"
```
Seems the better approach is to use something else, like "()".
This patch does it and refactors the code related.
Differential revision: https://reviews.llvm.org/D80123