This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 30 instances that rely on this "redirection". Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.
I'm planning to remove the redirection eventually.
Clear `synthesizedAligns` to prevent stray relocations to an unrelated
text section. Enhance the test to check llvm-readelf -r output.
---
Without linker relaxation enabled for a particular relocatable file or
section (e.g., using .option norelax), the assembler will not generate
R_RISCV_ALIGN relocations for alignment directives. This becomes
problematic in a two-stage linking process:
```
ld -r a.o b.o -o ab.o
// b.o is norelax. Its alignment information is lost in ab.o.
ld ab.o -o ab
```
When ab.o is linked into an executable, the preceding relaxed section
(a.o's content) might shrink. Since there's no R_RISCV_ALIGN relocation
in b.o for the linker to act upon, the `.word 0x3a393837` data in b.o
may end up unaligned in the final executable.
To address the issue, this patch inserts NOP bytes and synthesizes an
R_RISCV_ALIGN relocation at the beginning of a text section when the
alignment >= 4.
For simplicity, when RVC is disabled, we synthesize an ALIGN relocation
(addend: 2) for a 4-byte aligned section, allowing the linker to trim
the excess 2 bytes.
See also https://sourceware.org/bugzilla/show_bug.cgi?id=33236
This reverts commit 6f53f1c8d2bdd13e30da7d1b85ed6a3ae4c4a856.
synthesiedAligns is not cleared, leading to stray relocations for
unrelated sections. Revert for now.
Without linker relaxation enabled for a particular relocatable file or
section (e.g., using .option norelax), the assembler will not generate
R_RISCV_ALIGN relocations for alignment directives. This becomes
problematic in a two-stage linking process:
```
ld -r a.o b.o -o ab.o
// b.o is norelax. Its alignment information is lost in ab.o.
ld ab.o -o ab
```
When ab.o is linked into an executable, the preceding relaxed section
(a.o's content) might shrink. Since there's no R_RISCV_ALIGN relocation
in b.o for the linker to act upon, the `.word 0x3a393837` data in b.o
may end up unaligned in the final executable.
To address the issue, this patch inserts NOP bytes and synthesizes an
R_RISCV_ALIGN relocation at the beginning of a text section when the
alignment >= 4.
For simplicity, when RVC is disabled, we synthesize an ALIGN relocation
(addend: 2) for a 4-byte aligned section, allowing the linker to trim
the excess 2 bytes.
See also https://sourceware.org/bugzilla/show_bug.cgi?id=33236
Pull Request: https://github.com/llvm/llvm-project/pull/151639
When using Temporal Profiling with the BP algorithm, we encounter an
issue with the internal function reorder. In cases where the symbol
table contains entries like:
```
Symbol table '.symtab' contains 45 entries:
Num: Value Size Type Bind Vis Ndx Name
10: 0000000000000000 0 SECTION LOCAL DEFAULT 18 .text.L1
11: 0000000000000000 24 FUNC LOCAL DEFAULT 18 L1
````
The zero-sized section symbol .text.L1 gets stored in the secToSym map
first. However, when the function lookup searches for L1 (as seen in
[BPSectionOrdererBase.inc:191](https://github.com/llvm/llvm-project/blob/main/lld/include/lld/Common/BPSectionOrdererBase.inc#L191)),
it fails to find the correct entry in rootSymbolToSectionIdxs because
the section symbol has already claimed that slot.
This patch fixes the issue by skipping zero-sized symbols during the
addSections process, ensuring that function symbols are properly
registered for lookup.
The fixed-point layout algorithm handles linker scripts, thunks, and
relaxOnce (to suppress out-of-range GOT-indirect-to-PC-relative
optimization). These passes are not needed for relocatable links because
they require address information that is not yet available.
Since we don't scan relocations for relocatable links, the
`createThunks` and `relaxOnce` functions are no-ops anyway, making these
passes redundant.
To prevent cluttering the line history, I place the `if (...) break;`
inside the for loop.
Pull Request: https://github.com/llvm/llvm-project/pull/152240
This patch adds support for bitcode members of thin archives to DTLTO
(https://llvm.org/docs/DTLTO.html) in ELF LLD.
For DTLTO, bitcode identifiers must be valid paths to bitcode files on
disk. Clang does not support archive inputs for ThinLTO backend
compilations. This patch adjusts the identifier for bitcode members of
thin archives in DTLTO links so that it is the path to the member file
on disk, allowing such members to be supported in DTLTO.
This patch is sufficient to allow for self-hosting an LLVM build with
DTLTO when thin archives are used.
Note: Bitcode members of non-thin archives remain unsupported. This will
be addressed in a future change.
Testing:
- LLD lit test coverage has been added to check that the identifier is
adjusted appropriately.
- A cross-project lit test has been added to show that a DTLTO link can
succeed when linking bitcode members of thin archives.
For the design discussion of the DTLTO feature, see: #126654.
In LoongArch, we try GOT indirection to PC relative optimization in
normal or medium code model, whether or not with R_LARCH_RELAX
relocation.
From:
* pcalau12i $a0, %got_pc_hi20(sym_got)
* ld.w/d $a0, $a0, %got_pc_lo12(sym_got)
To:
* pcalau12i $a0, %pc_hi20(sym)
* addi.w/d $a0, $a0, %pc_lo12(sym)
If the original code sequence can be relaxed into a single instruction
`pcaddi`, this patch will not be taken (see
https://github.com/llvm/llvm-project/pull/123566).
The optimization related to GOT is split into two locations because the
`relax()` function is part of an iteration fixed-point algorithm. We
should minimize it to achieve better linker performance.
Note: Althouth the optimization has been performed, the GOT entries
still exists, similarly to AArch64. Eliminating the entries will
increase code complexity.
DynamicReloc::AgainstSymbol is now true and DynamicReloc::AddendOnly is
now false; uses of the constants were replaced mechanically.
Reviewers: rnk, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/150813
Aside from Computed, Kind is now just AddendOnly and AgainstSymbol, so
it's really just a bool reflecting whether the resulting ELF relocation
should reference the symbol or not. Refactor DynamicReloc's storage to
reflect this, splitting Computed out into its own orthogonal isFinal
bool. As part of this, rename computeRaw to finalize to reflect that
it's side-effecting.
This also allows needsDynSymIndex() to work even after finalize(), so
drop the existing assertion.
A future commit will refact the DynamicReloc API to take isAgainstSymbol
directly now the enum serves little purpose, as a more invasive,
mechanical change. For this commit we keep DynamicReloc::Kind as the
external API.
Reviewers: MaskRay, arichardson
Reviewed By: MaskRay, arichardson
Pull Request: https://github.com/llvm/llvm-project/pull/150812
This second constructor is just a shorthand for an AddendOnly relocation
against dummySym with R_ADDEND, so write it as such.
Reviewers: arichardson, MaskRay
Reviewed By: MaskRay, arichardson
Pull Request: https://github.com/llvm/llvm-project/pull/150811
Instead of having a special DynamicReloc::Kind, we can just use a new
RelExpr for the calculation needed. The only odd thing we do that allows
this is to keep a representative symbol for the OutputSection in
question (the first we see for it) around to use in this relocation for
the addend calculation.
This reduces DynamicReloc to just AddendOnly vs AgainstSymbol, plus the
internal Computed.
Reviewers: MaskRay, arichardson
Reviewed By: MaskRay, arichardson
Pull Request: https://github.com/llvm/llvm-project/pull/150810
The former is just a special case of the latter, ignoring the expr and
always just using the addend. If we use R_ADDEND as expr (which
previously had no effect, and so was misleadingly R_ABS not R_ADDEND in
all but one use) then we don't need to maintain this as a separate case.
Aside from the internal Computed Kind, this just leaves MipsMultiGotPage
as a special case; the only difference between the other two Kind values
is what needsDynSymIndex returns.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/150798
The former is just a special case of the latter, ignoring the expr and
always just using the addend, allowing (and enforcing) the sym is null.
If we just use dummySym then we don't need to maintain this as a
separate case, since R_ADDEND will return the addend unmodified for the
call to getRelocTargetVA.
Reviewers: MaskRay, arichardson
Reviewed By: MaskRay, arichardson
Pull Request: https://github.com/llvm/llvm-project/pull/150797
Currently we set the kind to AddendOnly in computeRaw() in order to
catch cases where we're not treating the DynamicReloc as computed.
Specifically, computeAddend() will then assert that sym is nullptr, so
can catch any subsequent calls for relocations that have sym set.
However, if the DynamicReloc was already AddendOnly (or
MipsMultiGotPage), we will silently allow this, which does work
correctly, but is not the intended use. We also cannot catch cases where
needsDynSymIndex() is called after this point, which would give a
misleading value if the kind were previously against a symbol.
By introducing a new (internal) Computed kind we can be explicit and add
more rigorous assertions, rather than abusing AddendOnly.
Reviewers: arichardson, MaskRay
Reviewed By: arichardson, MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/150799
This ensures subsequent calls to elf::postScanRelocations with a new Ctx
will correctly use an instance with the right internalFile (with the old
one presumably deleted, even). It also avoids having to create a new
instance in elf::getErrorPlace, and will allow more uses of such a dummy
symbol in future commits.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/150796
Linker relaxation to R_LARCH_GOT_PC_{HI20,LO12} is only possible when
the addend of the relocation is zero.
Note: For `ld.bfd`, GOT references with non-zero addends will trigger an
assert in LoongArch, but `lld` handles these cases without any errors.
```
ld.bfd: BFD (GNU Binutils) 2.44.0 assertion fail
/usr/src/debug/binutils/binutils-gdb/bfd/elfnn-loongarch.c:4248
```
The current implementation is dangerous if used in contexts that need a
single statement, since invokeELFT(...); is in fact two statements, a
switch statement and an empty statement.
This patch enables lld to read AArch64 Build Attributes and convert them
into GNU Properties.
Changes:
- Parses AArch64 Build Attributes from input object files.
- Converts known attributes into corresponding GNU Properties.
- Merges attributes when linking multiple objects.
Spec reference:
https://github.com/ARM-software/abi-aa/pull/230/files#r1030
Co-authored-by: Sivan Shani <sivan.shani@arm.com>
---------
Co-authored-by: Sivan Shani <sivan.shani@arm.com>
PR #148920 was merged before I could share my comments.
* Fix the text filename. There are other minor suggestions, but can be
done in #148985
* Make `isRelRoDataSection` concise, to be consistent with the majority of
helper functions.
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744
proposes to partition a static data section (like `.data.rel.ro`) into
two sections, one grouping the cold ones and the other grouping the
rest.
lld requires all relro sections to be contiguous. To place
`.data.rel.ro.unlikely` in the middle of all relro sections, this change
proposes to add `.data.rel.ro.unlikely` explicitly as RELRO section.
---------
Co-authored-by: Sam Elliott <quic_aelliott@quicinc.com>
Without thunks, programs will encounter link errors complaining that the
branch target is out of range. Thunks will extend the range of branch
targets, which is a critical need for large programs. Thunks provide
this flexibility at a cost of some modest code size increase.
When configured with the maximal feature set, the hexagon port of the
linux kernel would often encounter these limitations when linking with
`lld`.
The relocations which will be extended by thunks are:
* R_HEX_B22_PCREL, R_HEX_{G,L}D_PLT_B22_PCREL, R_HEX_PLT_B22_PCREL
relocations have a range of ± 8MiB on the baseline
* R_HEX_B15_PCREL: ±65,532 bytes
* R_HEX_B13_PCREL: ±16,380 bytes
* R_HEX_B9_PCREL: ±1,020 bytes
Fixes#149689
Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
---------
Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
* Fix `.reloc constant` to mean section_symbol+constant instead of
.+constant . The initial .reloc support from MIPS incorrectly
interpreted the offset.
* Delay the evaluation of the offset expression after
MCAssembler::layout, deleting a lot of code working with MCFragment.
* Delete many FIXME from https://reviews.llvm.org/D79625
* Some lld/ELF/Arch/LoongArch.cpp relaxation tests rely on .reloc .,
R_LARCH_ALIGN generating ALIGN relocations at specific location.
Sort the relocations.
Merge the attributes of object files being linked together. The
`.hexagon.attributes` section can be used by loaders and analysis tools.
This is similar to the .riscv.attributes, introduced in
8a900f2438b4a167b98404565ad4da2645cc9330 /
https://reviews.llvm.org/D138550.
The linker was crashing due to stack overflow when parsing ':ALIGN' in
an output section description. This commit fixes the linker script
parser so that the crash does not happen.
The root cause of the stack overflow is how we parse expressions
(readExpr) in linker script and the behavior of ScriptLexer::expect(...)
utility. ScriptLexer::expect does not do anything if errors have already
been encountered during linker script parsing. In particular, it never
increments the current token position in the script file, even if the
current token is the same as the expected token. This causes an infinite
call cycle on parsing an expression such as '(4096)' when an error has
already been encountered.
readExpr() calls readPrimary()
readPrimary() calls readParenExpr()
readParenExpr():
expect("("); // no-op, current token still points to '('
Expression *E = readExpr(); // The cycle continues...
Closes#146722
Signed-off-by: Parth Arora <partaror@qti.qualcomm.com>
This patch introduces support for Integrated Distributed ThinLTO (DTLTO)
in ELF LLD.
DTLTO enables the distribution of ThinLTO backend compilations via
external distribution systems, such as Incredibuild, during the
traditional link step: https://llvm.org/docs/DTLTO.html.
It is expected that users will invoke DTLTO through the compiler driver
(e.g., Clang) rather than calling LLD directly. A Clang-side interface
for DTLTO will be added in a follow-up patch.
Note: Bitcode members of archives (thin or non-thin) are not currently
supported. This will be addressed in a future change. As a consequence
of this lack of support, this patch is not sufficient to allow for
self-hosting an LLVM build with DTLTO. Theoretically,
--start-lib/--end-lib could be used instead of archives in a self-host
build. However, it's unclear how --start-lib/--end-lib can be easily
used with the LLVM build system.
Testing:
- ELF LLD `lit` test coverage has been added, using a mock distributor
to avoid requiring Clang.
- Cross-project `lit` tests cover integration with Clang.
For the design discussion of the DTLTO feature, see: #126654.
Support TLSDESC to initial-exec or local-exec optimizations. Introduce a
new hook RE_LOONGARCH_RELAX_TLS_GD_TO_IE_PAGE_PC and use existing
R_RELAX_TLS_GD_TO_IE_ABS to support TLSDESC => IE, while use existing
R_RELAX_TLS_GD_TO_LE to support TLSDESC => LE.
In normal or medium code model, there are two forms of code sequences:
* pcalau12i $a0, %desc_pc_hi20(sym_desc)
* addi.d $a0, $a0, %desc_pc_lo12(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
------
* pcaddi $a0, %desc_pcrel_20(sym_desc)
* ld.d $ra, $a0, %desc_ld(sym_desc)
* jirl $ra, $ra, %desc_call(sym_desc)
Convert to IE:
* pcalau12i $a0, %ie_pc_hi20(sym_ie)
* ld.[wd] $a0, $a0, %ie_pc_lo12(sym_ie)
Convert to LE:
* lu12i.w $a0, %le_hi20(sym_le) # le_hi20 != 0, otherwise NOP
* ori $a0 src, %le_lo12(sym_le) # le_hi20 != 0, src = $a0, otherwise src = $zero
Simplicity, whether tlsdescToIe or tlsdescToLe, we always tend to
convert the preceding instructions to NOPs, due to both forms of code
sequence (corresponding to relocation combinations:
R_LARCH_TLS_DESC_PC_HI20+R_LARCH_TLS_DESC_PC_LO12 and
R_LARCH_TLS_DESC_PCREL20_S2) have same process.
TODO: When relaxation enables, redundant NOPs can be removed. It will be
implemented in a future patch.
Note: All forms of TLSDESC code sequences should not appear interleaved
in the normal, medium or extreme code model, which compilers do not
generate and lld is unsupported. This is thanks to the guard in
PostRASchedulerList.cpp in llvm.
```
Calls are not scheduling boundaries before register allocation,
but post-ra we don't gain anything by scheduling across calls
since we don't need to worry about register pressure.
```
For non-SHF_ALLOC sections, sh_addr is set to 0.
Skip sections without SHF_ALLOC flag, so `minVA` will not be set to 0
with non-SHF_ALLOC sections, and the size of non-SHF_ALLOC sections will
not contribute to `maxVA`.
Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.
This reverts commit 1e95349dbe329938d2962a78baa0ec421e9cd7d1.
Original commit message follows:
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Reviewers: zmodem, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/145579
This caused assertion failures in applyBranchToBranchOpt():
llvm/include/llvm/Support/Casting.h:578:
decltype(auto) llvm::cast(From*)
[with To = lld:🧝:InputSection; From = lld:🧝:InputSectionBase]:
Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
See comment on the PR (https://github.com/llvm/llvm-project/pull/138366)
This reverts commit 491b82a5ec1add78d2c93370580a2f1897b6a364.
This also reverts the follow-up "[lld] Use llvm::partition_point (NFC) (#145209)"
This reverts commit 2ac293f5ac4cf65c0c038bf75a88f1d6715e467d.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.
Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.
The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:
CFI enabled: +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]
The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.
This optimization is implemented for AArch64 and X86_64 only.
lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:
```
N Min Max Median Avg Stddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788 0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327 0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```
[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057
Pull Request: https://github.com/llvm/llvm-project/pull/138366
Include the offset of a thunk in the ThunkSection when adding symbols.
At Thunk creation time the offset is set to 0 as we don't know where in
the ThunkSection the Thunk will end up. The symbol values are updated by
the setOffset() call in assignOffsets().
When we transform a thunk from a short to a long, we sometimes add a
mapping symbol. At this point the offset of the thunk is non zero and we
need to account for that when defining the symbol, as the setOffset()
call subtracts the offset before adding the new one back in.
To test; added a second thunk that is converted to a long thunk to
aarch64-thunk-bit-multipass. This second thunk is given a non zero
offset from the start of the Thunk Section so we can observe the mapping
symbol being put in the wrong place without accounting for the offset.
fixes: https://github.com/llvm/llvm-project/issues/142326
+ If `-z zicfilp=implicit` or option not specified, the output would
have the ZICFILP feature enabled/disabled based on input objects
+ If `-z zicfilp=<never|unlabeled|func-sig>`, the output would have
ZICFILP feature forced <off|on to the "unlabeled" scheme|on to the
"func-sig" scheme>
+ If `-z zicfiss=implicit` or option not specified, the output would
have the ZICFISS feature enabled/disabled based on input objects
+ If `-z zicfiss=<never|always>`, the output would have the ZICFISS
feature forced <off|on>
Previously, the AArch64 PAuth ABI core values were stored as an
ArrayRef<uint8_t>, introducing unnecessary indirection.
This patch replaces the ArrayRef with two explicit uint64_t fields:
aarch64PauthAbiPlatform and aarch64PauthAbiVersion. This simplifies the
representation and improves readability.
No functional change intended, aside from improved error messages.
The behavior of an undefined weak reference is implementation defined.
For static -no-pie linking, dynamic relocations are generally avoided (except
IRELATIVE). -shared linking generally emits dynamic relocations.
Dynamic -no-pie linking and -pie allow flexibility. Changes adjust the
behavior for better consistency and simpler internal representation,
e.g. https://reviews.llvm.org/D63003https://reviews.llvm.org/D105164
(generalized to undefined non-weak in
2fcaa00d1e2317a90c9071b735eb0e758b5dd58b).
GNU ld introduced -z [no]dynamic-undefined-weak option to fine-tune the
behavior. (The option is not very effective with -no-pie, e.g. on
x86-64, `ld.bfd a.o s.so -z dynamic-undefined-weak` generates
R_X86_64_NONE relocations instead of GLOB_DAT/JUMP_SLOT)
This patch implements -z [no]dynamic-undefined-weak option.
The effects are summarized as follows:
* Static -no-pie: no-op
* Dynamic -no-pie: nodynamic-undefined-weak suppresses GLOB_DAT/JUMP_SLOT
* Static -pie: dynamic-undefined-weak generates ABS/GLOB_DAT/JUMP_SLOT.
https://discourse.llvm.org/t/lld-weak-undefined-symbols-in-vdso-only/86749
* Dynamic -pie: nodynamic-undefined-weak suppresses ABS/GLOB_DAT/JUMP_SLOT
The -pie behavior likely stays stable while -no-pie (`!ctx.arg.isPic` in
`isStaticLinkTimeConstant`) behavior will likely change in the future.
The current default value of ctx.arg.zDynamicUndefined is selected to
prevent behavior changes.
Pull Request: https://github.com/llvm/llvm-project/pull/143831
So that when mixing small and large text, large text stays out of the
way of the rest of the binary.
Place large RX sections at the beginning rather than at the end so that
with `--no-rosegment`, the large text and rodata share a single PT_LOAD
segment. Place large RWX sections at the end to keep writable and
readonly sections separate.
Clang started emitting the large section flag for `.ltext` sections in
#73037.
* Merge the special case into isStaticLinkTimeConstant
* Generalize isUndefWeak to isUndefined. undefined non-weak is an error
case. We choose to be general, which also brings us in line with GNU ld.
Increase specificity by using the correct unit sizes. KBytes is an
abbreviation for kB, 1000 bytes, and the hardware industry as well as
several operating systems have now switched to using 1000 byte kBs.
If this change is acceptable, sometimes GitHub mangles merges to use the
original email of the account. $dayjob asks contributions have my work
email. Thanks!
The new test (derived from riscv32 openssl/test/cmp_msg_test.c) revealed
oscillation in two R_RISCV_CALL_PLT jumps:
- First jump (~2^11 bytes away): alternated between 4 and 8 bytes.
- Second jump (~2^20 bytes away): alternated between 2 and 8 bytes.
The issue is not related to alignment. In 2019, GNU ld addressed a
similar problem by reducing the relaxation allowance for cross-section
relaxation (https://sourceware.org/bugzilla/show_bug.cgi?id=25181).
This approach would result in a suboptimal layout for the tight range
tested by riscv-relax-call.s.
This patch stabilizes the process by preventing `remove` increment after
a few passes, similar to integrated assembler's fragment relaxation.
(For the Android bit reproduce, `pass < 2` leads to non-optimal layout
while `pass < 3` and `pass < 4` output is identical.)
Fix https://github.com/llvm/llvm-project/issues/113838
Possibly fix https://github.com/llvm/llvm-project/issues/123248 (inputs
are bitcode, subject to ever-changing code generation, not reproducible)
Pull Request: https://github.com/llvm/llvm-project/pull/142899