llvm-project

Author	SHA1	Message	Date
Daniel Thornburgh	66466ff151	Reland: [LLD] Implement --enable-non-contiguous-regions (#90007 ) When enabled, input sections that would otherwise overflow a memory region are instead spilled to the next matching output section. This feature parallels the one in GNU LD, but there are some differences from its documented behavior: - /DISCARD/ only matches previously-unmatched sections (i.e., the flag does not affect it). - If a section fails to fit at any of its matches, the link fails instead of discarding the section. - The flag --enable-non-contiguous-regions-warnings is not implemented, as it exists to warn about such occurrences. The implementation places stubs at possible spill locations, and replaces them with the original input section when effecting spills. Spilling decisions occur after address assignment. Sections are spilled in reverse order of assignment, with each spill naively decreasing the size of the affected memory regions. This continues until the memory regions are brought back under size. Spilling anything causes another pass of address assignment, and this continues to fixed point. Spilling after rather than during assignment allows the algorithm to consider the size effects of unspillable input sections that appear later in the assignment. Otherwise, such sections (e.g. thunks) may force an overflow, even if spilling something earlier could have avoided it. A few notable feature interactions occur: - Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the input section were actually placed there. - SHF_MERGE synthetic sections use the spill list of their first contained input section (the one that gives the section its name). - ICF occurs oblivious to spill sections; spill lists for merged-away sections become inert and are removed after assignment. - SHF_LINK_ORDER and .ARM.exidx are ordered according to the final section ordering, after all spilling has completed. - INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.	2024-05-13 11:06:54 -07:00
Daniel Thornburgh	81f34afa5c	Revert "[LLD] Implement --enable-non-contiguous-regions" (#92005 ) Reverts llvm/llvm-project#90007 Broke in merging I think.	2024-05-13 10:38:40 -07:00
Daniel Thornburgh	673114447b	[LLD] Implement --enable-non-contiguous-regions (#90007 ) When enabled, input sections that would otherwise overflow a memory region are instead spilled to the next matching output section. This feature parallels the one in GNU LD, but there are some differences from its documented behavior: - /DISCARD/ only matches previously-unmatched sections (i.e., the flag does not affect it). - If a section fails to fit at any of its matches, the link fails instead of discarding the section. - The flag --enable-non-contiguous-regions-warnings is not implemented, as it exists to warn about such occurrences. The implementation places stubs at possible spill locations, and replaces them with the original input section when effecting spills. Spilling decisions occur after address assignment. Sections are spilled in reverse order of assignment, with each spill naively decreasing the size of the affected memory regions. This continues until the memory regions are brought back under size. Spilling anything causes another pass of address assignment, and this continues to fixed point. Spilling after rather than during assignment allows the algorithm to consider the size effects of unspillable input sections that appear later in the assignment. Otherwise, such sections (e.g. thunks) may force an overflow, even if spilling something earlier could have avoided it. A few notable feature interactions occur: - Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the input section were actually placed there. - SHF_MERGE synthetic sections use the spill list of their first contained input section (the one that gives the section its name). - ICF occurs oblivious to spill sections; spill lists for merged-away sections become inert and are removed after assignment. - SHF_LINK_ORDER and .ARM.exidx are ordered according to the final section ordering, after all spilling has completed. - INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.	2024-05-13 10:30:50 -07:00
Fangrui Song	04d0a691af	[ELF] Fix --compress-debug-sections=zstd when zlib is disabled	2024-05-07 16:56:45 -07:00
Fangrui Song	6d44a1ef55	[ELF] Adjust --compress-sections to support compression level zstd excels at scaling from low-ratio-very-fast to high-ratio-pretty-slow. Some users prioritize speed and prefer disk read speed, while others focus on achieving the highest compression ratio possible, similar to traditional high-ratio codecs like LZMA. Add an optional `level` to `--compress-sections` (#84855) to cater to these diverse needs. While we initially aimed for a one-size-fits-all approach, this no longer seems to work. (https://richg42.blogspot.com/2015/11/the-lossless-decompression-pareto.html) When --compress-debug-sections is used together, make --compress-sections take precedence since --compress-sections is usually more specific. Remove the level distinction between -O/-O1 and -O2 for --compress-debug-sections=zlib for a more consistent user experience. Pull Request: https://github.com/llvm/llvm-project/pull/90567	2024-05-01 11:40:46 -07:00
Fangrui Song	91fef0013f	[ELF] Catch zlib deflateInit2 error The function may return Z_MEM_ERROR or Z_STREAM_ERR. The former does not have a good way of testing. The latter will be possible with a pending change that allows setting the compression level, which will come with a test.	2024-05-01 11:32:04 -07:00
Fangrui Song	79095b4079	[ELF] --compress-debug-sections=zstd: replace ZSTD_c_nbWorkers parallelism with multi-frame parallelism https://reviews.llvm.org/D133679 utilizes zstd's multithread API to create one single frame. This provides a higher compression ratio but is significantly slower than concatenating multiple frames. With manual parallelism, it is easier to parallelize memcpy in OutputSection::writeTo for parallel memcpy. In addition, as the individual allocated decompression buffers are much smaller, we can make a wild guess (compressed_size/4) without worrying about a resize (due to wrong guess) would waste memory.	2024-04-29 22:05:35 -07:00
Fangrui Song	0e47dfede4	[ELF] Add isStaticRelSecType to simplify SHT_REL/SHT_RELA testing. NFC and make it easier to introduce a new relocation format. https://discourse.llvm.org/t/rfc-relleb-a-compact-relocation-format-for-elf/77600 Pull Request: https://github.com/llvm/llvm-project/pull/85893	2024-03-20 09:58:56 -07:00
Fangrui Song	ea72c082bc	[ELF] Change getSymbolIndex to use const reference. NFC	2024-03-18 13:58:55 -07:00
Fangrui Song	f1ca2a0967	[ELF] Add --compress-section to compress matched non-SHF_ALLOC sections --compress-sections <section-glib>=[none\|zlib\|zstd] is similar to --compress-debug-sections but applies to broader sections without the SHF_ALLOC flag. lld will report an error if a SHF_ALLOC section is matched. An interesting use case is to compress `.strtab`/`.symtab`, which consume a significant portion of the file size (15.1% for a release build of Clang). An older revision is available at https://reviews.llvm.org/D154641 . This patch focuses on non-allocated sections for safety. Moving `maybeCompress` as D154641 does not handle STT_SECTION symbols for `-r --compress-debug-sections=zlib` (see `relocatable-section-symbol.s` from #66804). Since different output sections may use different compression algorithms, we need CompressedData::type to generalize config->compressDebugSections. GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=27452 Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674 Pull Request: https://github.com/llvm/llvm-project/pull/84855	2024-03-12 10:56:14 -07:00
Yaxun (Sam) Liu	3594769f20	[ELF] Define NOMINMAX to fix zlib.h caused build failure on Windows (#70368 ) On Windows when zlib is enabled, zlib header introduced some Windows headers which defines max as a macro. Since OutputSections.cpp uses std::max with template argument, this causes compilation error. Define macro NOMINMAX to avoid this.	2023-11-02 08:59:54 -04:00
Fangrui Song	0cbe49eade	[ELF] Implement getImplicitAddend and enable checkDynamicRelocsDefault for PPC32	2023-09-15 22:49:18 -07:00
Fangrui Song	1b65b159da	[ELF] Enable checkDynamicRelocsDefault for PPC64 .plt and .branch_lt have the type of SHT_NOBITS and may be relocated by dynamic relocations with non-zero addends. They should be skipped for the --check-dynamic-relocations check, as --apply-dynamic-relocs does not apply. A side effect is that -z rel does not work for the two sections. Added two --apply-dynamic-relocs --check-dynamic-relocations tests. Also checked linking a PPC64 clang.	2023-09-15 22:38:18 -07:00
Simi Pallipurath	f146763e07	Revert "Revert "[lld][Arm] Big Endian - Byte invariant support."" This reverts commit d8851384c6ac2a1cea15e05228dbde5f13654e23. Reason: Applied the fix for the Asan buildbot failures.	2023-06-22 16:10:18 +01:00
Simi Pallipurath	d8851384c6	Revert "[lld][Arm] Big Endian - Byte invariant support." This reverts commit 8cf8956897ce9bca3176c6339077b1ca17b27abc.	2023-06-20 17:27:44 +01:00
Simi Pallipurath	8cf8956897	[lld][Arm] Big Endian - Byte invariant support. Arm has BE8 big endian configuration called a byte-invariant(every byte has the same address on little and big-endian systems). When in BE8 mode: 1. Instructions are big-endian in relocatable objects but little-endian in executables and shared objects. 2. Data is big-endian. 3. The data encoding of the ELF file is ELFDATA2MSB. To support BE8 without an ABI break for relocatable objects,the linker takes on the responsibility of changing the endianness of instructions. At a high level the only difference between BE32 and BE8 in the linker is that for BE8: 1. The linker sets the flag EF_ARM_BE8 in the ELF header. 2. The linker endian reverses the instructions, but not data. This patch adds BE8 big endian support for Arm. To endian reverse the instructions we'll need access to the mapping symbols. Code sections can contain a mix of Arm, Thumb and literal data. We need to endian reverse Arm instructions as words, Thumb instructions as half-words and ignore literal data.The only way to find these transitions precisely is by using mapping symbols. The instruction reversal will need to take place after relocation. For Arm BE8 code sections (Section has SHF_EXECINSTR flag ) we inserted a step after relocation to endian reverse the instructions. The implementation strategy i have used here is to write all sections BE32 including SyntheticSections then endian reverse all code in InputSections via mapping symbols. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D150870	2023-06-20 14:08:21 +01:00
Fangrui Song	8d85c96e0e	[lld] StringRef::{starts,ends}with => {starts,ends}_with. NFC The latter form is now preferred to be similar to C++20 starts_with. This replacement also removes one function call when startswith is not inlined.	2023-06-05 14:36:19 -07:00
Leonard Chan	b9249a69cc	[lld][ELF] Do not emit warning for NOLOAD output sections Much of NOLOAD's intended use is to explicitly change the type of an output section, so we shouldn't flag these as warnings. Differential Revision: https://reviews.llvm.org/D151144	2023-05-23 20:41:20 +00:00
Fangrui Song	1408504564	[ELF] Name MergeSyntheticSection using an input section instead of the output section In a link map, the input section name gives more information. See the updated merge-entsize.s for an example. The output file is unchanged. Compiler generated input sections with the SHF_MERGE flag have names such as .rodata.str1.1 and .rodata.cstN, and are not affected by -fdata-sections. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D149466	2023-05-02 09:35:00 -07:00
Alexey Lapshin	fea8c07356	[Support][Parallel] Add sequential mode to TaskGroup::spawn(). This patch allows to specify that some part of tasks should be done in sequential order. It makes it possible to not use condition operator for separating sequential tasks: TaskGroup tg; for () { if(condition) ==> tg.spawn([](){fn();}, condition) fn(); else tg.spawn([](){fn();}); } It also prevents execution on main thread. Which allows adding checks for getThreadIndex() function discussed in D142318. The patch also replaces std::stack with std::deque in the ThreadPoolExecutor to have natural execution order in case (parallel::strategy.ThreadsRequested == 1). Differential Revision: https://reviews.llvm.org/D148728	2023-04-26 13:52:26 +02:00
Jez Ng	3df4c5a92f	[NFC] Optimize vector usage in lld By using emplace_back, as well as converting some loops to for-each, we can do more efficient vectorization. Make copy constructor for TemporaryFile noexcept. Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D139552	2023-01-26 20:31:42 -05:00
serge-sans-paille	984b800a03	Move from llvm::makeArrayRef to ArrayRef deduction guides - last part This is a follow-up to https://reviews.llvm.org/D140896, split into several parts as it touches a lot of files. Differential Revision: https://reviews.llvm.org/D141298	2023-01-10 11:47:43 +01:00
Guillaume Chatelet	08e2a76381	[lld][NFC] rename ELF alignment into addralign	2022-12-01 16:20:12 +00:00
Fangrui Song	1a50213ce7	[ELF] --compress-debug-sections=zstd: ignore error if zstd was not built with ZSTD_MULTITHREAD	2022-09-22 13:16:50 -07:00
Alex Brachet	38b20a02fe	[ELF] Fix std::min error on MacOs	2022-09-22 19:03:13 +00:00
Dmitri Gribenko	eda9fdc493	Fix -Wunused-local-typedef warning in some build configurations	2022-09-22 17:10:17 +02:00
Fangrui Song	fa74144c64	[ELF] Parallelize --compress-debug-sections=zstd See D117853: compressing debug sections is a bottleneck and therefore it has a large value parallizing the step. zstd provides multi-threading API and the output is deterministic even with different numbers of threads (see https://github.com/facebook/zstd/issues/2238). Therefore we can leverage it instead of using the pigz-style sharding approach. Also, switch to the default compression level 3. The current level 5 is significantly slower without providing justifying size benefit. ``` 'dash b.sh 1' ran 1.05 ± 0.01 times faster than 'dash b.sh 3' 1.18 ± 0.01 times faster than 'dash b.sh 4' 1.29 ± 0.02 times faster than 'dash b.sh 5' level=1 size: 358946945 level=3 size: 309002145 level=4 size: 307693204 level=5 size: 297828315 ``` Reviewed By: andrewng, peter.smith Differential Revision: https://reviews.llvm.org/D133679	2022-09-21 11:13:03 -07:00
Fangrui Song	449f2ca146	[ELF] Add --compress-debug-sections=zstd `clang -gz=zstd a.o` passes this option to the linker. This option compresses output debug sections with zstd and sets ch_type to ELFCOMPRESS_ZSTD. As of today, very few DWARF consumers recognize ELFCOMPRESS_ZSTD. Use the llvm::zstd::compress API with level llvm::zstd::DefaultCompression (5), which we may tune after we have more experience with zstd output. zstd has built-in parallel compression support (so we don't need to do D117853 for zlib), which is not leveraged yet. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D133548	2022-09-09 10:30:18 -07:00
Fangrui Song	3b4d800911	[ELF] Parallelize writes of different OutputSections We currently process one OutputSection at a time and for each OutputSection write contained input sections in parallel. This strategy does not leverage multi-threading well. Instead, parallelize writes of different OutputSections. The default TaskSize for parallelFor often leads to inferior sharding. We prepare the task in the caller instead. * Move llvm::parallel::detail::TaskGroup to llvm::parallel::TaskGroup * Add llvm::parallel::TaskGroup::execute. * Change writeSections to declare TaskGroup and pass it to writeTo. Speed-up with --threads=8: * clang -DCMAKE_BUILD_TYPE=Release: 1.11x as fast * clang -DCMAKE_BUILD_TYPE=Debug: 1.10x as fast * chrome -DCMAKE_BUILD_TYPE=Release: 1.04x as fast * scylladb build/release: 1.09x as fast On M1, many benchmarks are a small fraction of a percentage faster. Mozilla showed the largest difference with the patch being about 1.03x as fast. Differential Revision: https://reviews.llvm.org/D131247	2022-08-24 09:40:03 -07:00
Fangrui Song	e0612c91cd	[ELF] Optimize getInputSections. NFC In the majority of cases (e.g. orphan sections), an OutputSection has at most one InputSectionDescription (isd). By changing the return type to ArrayRef<InputSection *> we can just reference the isd->sections. For OutputSections with more than one InputSectionDescription we use a caller provided SmallVector to copy the elements as before. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D129111	2022-07-05 23:31:09 -07:00
Nico Weber	7effcbda49	Rename parallelForEachN to just parallelFor Patch created by running: rg -l parallelForEachN \| xargs sed -i '' -c 's/parallelForEachN/parallelFor/' No behavior change. Differential Revision: https://reviews.llvm.org/D128140	2022-06-19 17:49:00 -04:00
Fangrui Song	b3d5bb3b30	[ELF] Change (NOLOAD) type mismatch to use SHT_NOBITS instead of SHT_PROGBITS Placing a non-SHT_NOBITS input section in an output section specified with (NOLOAD) is fishy but used by some projects. D118840 changed the output type to SHT_PROGBITS, but using the specified type seems to make more sense and improve GNU ld compatibility: `(NOLOAD)` seems to change the output section type regardless of input. I think we should keep the current type mismatch warning as it does indicate an error-prone usage. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D125074	2022-05-06 07:49:42 -07:00
Fangrui Song	6c814931bc	[ELF] Don't use multiple inheritance for OutputSection. NFC Add an OutputDesc class inheriting from SectionCommand. An OutputDesc wraps an OutputSection. This change allows InputSection::getParent to be inlined. Differential Revision: https://reviews.llvm.org/D120650	2022-03-08 11:23:42 -08:00
Fangrui Song	4976d1fe58	[ELF] Move SyntheticSection check from InputSection::writeTo to OutputSection::writeTo. NFC Simplify code and make the heavyweight operation to the call site so that it is clearer how to improve the inefficient scheduling in the future.	2022-02-27 23:28:52 -08:00
Fangrui Song	b01430a04f	[ELF] Don't rely on Symbols.h's transitive inclusion of InputFiles.h. NFC	2022-02-23 19:18:24 -08:00
Fangrui Song	cb0a4bb5be	[ELF] Change (NOLOAD) section type mismatch error to warning Making a (NOLOAD) section SHT_PROGBITS is fishy (the user may expect all-zero content, but the linker does not check that), but some projects (e.g. Linux kernel https://github.com/ClangBuiltLinux/linux/issues/1597) traditionally rely on the behavior. Issue a warning to not break them.	2022-02-18 11:20:36 -08:00
Fangrui Song	66f8ac8d36	[ELF] Support (TYPE=<value>) to customize the output section type The current output section type allows to set the ELF section type to SHT_PROGBITS or SHT_NOLOAD. This patch allows an arbitrary section value to be specified. Some common SHT_* literal names are supported as well. ``` SECTIONS { note (TYPE=SHT_NOTE) : { BYTE(8) *(note) } init_array ( TYPE=14 ) : { QUAD(14) } fini_array (TYPE = SHT_FINI_ARRAY) : { QUAD(15) } } ``` When `sh_type` is specified, it is an error if an input section has a different type. Our syntax is compatible with GNU ld 2.39 (https://sourceware.org/bugzilla/show_bug.cgi?id=28841). Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D118840	2022-02-17 12:10:58 -08:00
Fangrui Song	27bb799095	[ELF] Clean up headers. NFC	2022-02-07 21:53:34 -08:00
Mariusz Ceier	e8bff9ae54	Fix lld standalone build lld/ELF/OutputSections.cpp includes llvm/Config/config.h for LLVM_ENABLE_ZLIB definition, but llvm/Config/config.h doesn't exist in standalone build. To fix this, this patch moves LLVM_ENABLE_ZLIB from config.h to llvm-config.h and updates OutputSections.cpp to include llvm-config.h instead of config.h Reviewed By: MaskRay, mgorny Differential Revision: https://reviews.llvm.org/D119058	2022-02-07 09:20:03 -08:00
Fangrui Song	5a2020d069	[ELF] copyShtGroup: replace unordered_set<uint32_t> with DenseSet<uint32_t>. NFC We don't need to support the empty/tombstone key section index.	2022-01-30 01:18:41 -08:00
Fangrui Song	f318fd9bf8	[ELF] crtbegin/crtend test: replace std::regex with hand-written matcher. NFC My x86-64 lld executable is 18KiB smaller.	2022-01-30 01:11:19 -08:00
Fangrui Song	fcd8817da5	[ELF] Simplify maybeCompress with lld::split. NFC	2022-01-30 00:44:19 -08:00
Fangrui Song	913914f0f8	[ELF] Simplify writing the Elf_Chdr header. NFC And avoiding changing `size` in `writeTo`.	2022-01-26 10:23:56 -08:00
Fangrui Song	2a80c3dbe1	[ELF] Clarify that Z_BEST_SPEED==1 in a comment. NFC	2022-01-25 22:40:53 -08:00
Fangrui Song	7438dbe078	[ELF] Cast size to size_t. NFC To fix ../../chromeclang/bin/../include/c++/v1/__algorithm/min.h:39:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('unsigned long' vs. 'unsigned long long') on macOS arm64.	2022-01-25 22:38:24 -08:00
Fangrui Song	223f9dea3d	[ELF] maybeCompress: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC And mention that it is zero-initialized. I do not notice a speed-up if changed to be uninitialized by forcing the zero filler in writeTo.	2022-01-25 22:15:44 -08:00
Fangrui Song	4cdc441690	[ELF] Parallelize --compress-debug-sections=zlib When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and in a --threads=8 link "Compress debug sections" takes ~70% time. This patch splits a section into 1MiB shards and calls zlib `deflake` parallelly. DEFLATE blocks are a bit sequence. We need to ensure every shard starts at a byte boundary for concatenation. We use Z_SYNC_FLUSH for all shards but the last to flush the output to a byte boundary. (Z_FULL_FLUSH can be used as well, but Z_FULL_FLUSH clears the hash table which just wastes time.) The last block requires the BFINAL flag. We call deflate with Z_FINISH to set the flag as well as flush the output to a byte boundary. Under the hood, all of Z_SYNC_FLUSH, Z_FULL_FLUSH, and Z_FINISH emit a non-compressed block (called stored block in zlib). RFC1951 says "Any bits of input up to the next byte boundary are ignored." In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total speed is 2.54x. Because the hash table for one shard is not shared with the next shard, the output is slightly larger. Better compression ratio can be achieved by preloading the window size from the previous shard as dictionary (`deflateSetDictionary`), but that is overkill. ``` # 1MiB shards % bloaty clang.new -- clang.old FILE SIZE VM SIZE -------------- -------------- +0.3% +129Ki [ = ] 0 .debug_str +0.1% +105Ki [ = ] 0 .debug_info +0.3% +101Ki [ = ] 0 .debug_line +0.2% +2.66Ki [ = ] 0 .debug_abbrev +0.0% +1.19Ki [ = ] 0 .debug_ranges +0.1% +341Ki [ = ] 0 TOTAL # 2MiB shards % bloaty clang.new -- clang.old FILE SIZE VM SIZE -------------- -------------- +0.2% +74.2Ki [ = ] 0 .debug_line +0.1% +72.3Ki [ = ] 0 .debug_str +0.0% +69.9Ki [ = ] 0 .debug_info +0.1% +976 [ = ] 0 .debug_abbrev +0.0% +882 [ = ] 0 .debug_ranges +0.0% +218Ki [ = ] 0 TOTAL ``` Bonus in not using zlib::compress * we can compress a debug section larger than 4GiB * peak memory usage is lower because for most shards the output size is less than 50% input size (all less than 55% for a large binary I tested, but decreasing the initial output size does not decrease memory usage) Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D117853	2022-01-25 10:29:04 -08:00
Fangrui Song	a1c2ee0147	[ELF] LinkerScript/OutputSection: change other std::vector members to SmallVector 11+KiB smaller .text with both libc++ and libstdc++ builds.	2021-12-26 13:53:47 -08:00
Fangrui Song	bf7f3dd74e	[ELF] Move outSecOff addition from InputSection::writeTo to the caller Simplify the code a bit and improve consistency with SyntheticSection::writeTo.	2021-12-26 12:11:41 -08:00
Fangrui Song	ba948c5a9c	[ELF] Use SmallVector for some global variables (Files and Sections). NFC My lld executable is 26+KiB smaller.	2021-12-22 22:30:08 -08:00

1 2 3 4 5 ...

792 Commits