llvm-project

Author	SHA1	Message	Date
Kazu Hirata	fae34938f6	[llvm] Use *Set::insert_range (NFC) (#132591 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range with iterator ranges. For each case, I've verified that foos is defined as make_range(foo_begin(), foo_end()) or in a similar manner.	2025-03-22 22:14:45 -07:00
Mingming Liu	d99033e4b4	[LTO][WPD] Suppress WPD on a class if the LTO unit doesn't have the prevailing definition of this class (#131721 ) Before this patch, whole program devirtualization is suppressed on a class if any superclass is visible to regular object files, by recording the class GUID in `VisibleToRegularObjSymbols`. This patch suppresses whole program devirtualization on a class if the LTO unit doesn't have the prevailing definition of this class (e.g., the prevailing definition is in a shared library) Implementation summaries: 1. In llvm/lib/LTO/LTO.cpp, `IsVisibleToRegularObj` is updated to look at the global resolution's `IsPrevailing` bit for ThinLTO and regularLTO. 2. In llvm/tools/llvm-lto2/llvm-lto2.cpp, - three command line options are added so `llvm-lto2` can override `Conf.HasWholeProgramVisibility`, `Conf.ValidateAllVtablesHaveTypeInfos` and `Conf.AllVtablesHaveTypeInfos`. The test case is reduced from a small C++ program (main.cc, lib.cc/h pasted below in [1]). To reproduce the program failure without this patch, compile lib.cc into a shared library, and provide it to a ThinLTO build of main.cc (commands are pasted in [2]). [1] * lib.h ``` #include <cstdio> class Derived { public: void dispatch(); virtual void print(); virtual void sum(); }; void Derived::dispatch() { static_cast<Derived>(this)->print(); static_cast<Derived>(this)->sum(); } void Derived::sum() { printf("Derived::sum\n"); } __attribute__((noinline)) void* create(int i); __attribute__((noinline)) void* getPtr(int i); ``` * lib.cc ``` #include "lib.h" #include <cstdio> #include <iostream> class Derived2 : public Derived { public: void print() override { printf("DerivedSharedLib\n"); } void sum() override { printf("DerivedSharedLib::sum\n"); } }; void Derived::print() { printf("Derived\n"); } __attribute__((noinline)) void* create(int i) { if (i & 1) return new Derived2(); return new Derived(); } ``` * main.cc ``` cat main.cc #include "lib.h" class DerivedN : public Derived { public: }; __attribute__((noinline)) void* getPtr(int x) { return new DerivedN(); } int main() { Derivedb = static_cast<Derived>(create(201)); b->dispatch(); delete b; Derived* a = static_cast<Derived*>(getPtr(202)); a->dispatch(); delete a; return 0; } ``` [2] ``` # compile lib.o in a shared library. $ ./bin/clang++ -O2 -fPIC -c lib.cc -o lib.o $ ./bin/clang++ -shared -o libdata.so lib.o # Provide the shared library in `-ldata` $ ./bin/clang++ -v -g -ldata --save-temps -fno-discard-value-names -Wl,-mllvm,-print-before=wholeprogramdevirt -Wl,-mllvm,-wholeprogramdevirt-check=trap -Rpass=wholeprogramdevirt -Wl,--lto-whole-program-visibility -Wl,--lto-validate-all-vtables-have-type-infos -mllvm -disable-icp=true -Wl,-mllvm,-disable-icp=false -flto=thin -fwhole-program-vtables -fno-split-lto-unit -fuse-ld=lld main.cc -L . -o main >/tmp/wholeprogramdevirt.ir 2>&1 # Run the program hits a segmentation fault with `-Wl,-mllvm,-wholeprogramdevirt-check=trap` $ LD_LIBRARY_PATH=. ./main DerivedSharedLib Trace/breakpoint trap (core dumped) ```	2025-03-19 22:10:57 -07:00
Vitaly Buka	7dd5f23279	[NFC][LTO] Move GUID calculation into CfiFunctionIndex (#130370 ) Preparation for CFI Index refactoring, which will fix O(N^2) in ThinLTO indexing.	2025-03-07 18:59:48 -08:00
Nikita Popov	979c275097	[IR] Store Triple in Module (NFC) (#129868 ) The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.	2025-03-06 10:27:47 +01:00
Rahul Joshi	6924fc0326	[LLVM] Add `Intrinsic::getDeclarationIfExists` (#112428 ) Add `Intrinsic::getDeclarationIfExists` to lookup an existing declaration of an intrinsic in a `Module`.	2024-10-16 07:21:10 -07:00
Kyungwoo Lee	e547d041fa	Fix build failure for [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (#90933 )	2024-10-09 16:09:22 -07:00
Kyungwoo Lee	dc85d5263e	[CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (#90933 ) This feature is enabled by `-codegen-data-thinlto-two-rounds`, which effectively runs the `-codegen-data-generate` and `-codegen-data-use` in two rounds to enable global outlining with ThinLTO. 1. The first round: Run both optimization + codegen with a scratch output. Before running codegen, we serialize the optimized bitcode modules to a temporary path. 2. From the scratch object files, we merge them into the codegen data. 3. The second round: Read the optimized bitcode modules and start the codegen only this time. Using the codegen data, the machine outliner effectively performs the global outlining. Depends on #90934, #110461 and #110463. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-10-09 15:37:41 -07:00
Kyungwoo Lee	1b53aaec55	[ThinLTO][NFC] Refactor ThinBackend (#110461 ) This is a prep for https://github.com/llvm/llvm-project/pull/90933. - Change `ThinBackend` from a function to a type. - Store the parallelism level in the type, which will be used when creating two-codegen round backends that inherit this value. - `ThinBackendProc` is hoisted to `LTO.h` from `LTO.cpp` to provide its body for `ThinBackend`. However, `emitFiles()` is still implemented separately in `LTO.cpp`, distinct from its parent class.	2024-10-07 22:24:04 -07:00
Nuri Amari	2edd897a42	Make WriteIndexesThinBackend multi threaded (#109847 ) We've noticed that for large builds executing thin-link can take on the order of 10s of minutes. We are only using a single thread to write the sharded indices and import files for each input bitcode file. While we need to ensure the index file produced lists modules in a deterministic order, that doesn't prevent us from executing the rest of the work in parallel. In this change we use a thread pool to execute as much of the backend's work as possible in parallel. In local testing on a machine with 80 cores, this change makes a thin-link for ~100,000 input files run in ~2 minutes. Without this change it takes upwards of 10 minutes. --------- Co-authored-by: Nuri Amari <nuriamari@fb.com>	2024-10-07 08:16:46 -07:00
Kyungwoo Lee	ed59d571f2	[ThinLTO][NFC] Refactor FileCache (#110463 ) This is a prep for https://github.com/llvm/llvm-project/pull/90933. - Change `FileCache` from a function to a type. - Store the cache directory in the type, which will be used when creating additional caches for two-codegen round runs that inherit this value.	2024-10-04 07:50:28 -07:00
Kyungwoo Lee	c1959813d6	[CGData][ThinLTO][NFC] Prep for two-codegen rounds (#90934 ) This is NFC for https://github.com/llvm/llvm-project/pull/90933. - Create a lambda function, `RunBackends`, to group the backend operations into a single function. - Explicitly pass the `CodeGenOnly` argument to thinBackend, instead of depending on a configuration value. Depends on https://github.com/llvm/llvm-project/pull/90304. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-10-03 09:58:01 -07:00
Kazu Hirata	3dad29b677	[LTO] Remove unused includes (NFC) (#108110 ) clangd reports these as unused headers. My manual inspection agrees with the findings.	2024-09-10 19:36:04 -07:00
Mingming Liu	09b231cb38	Re-apply "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107792 ) Fix the use-after-free bug and re-apply https://github.com/llvm/llvm-project/pull/106193 * Without the fix, the string referenced by `objSym.Name` could be destroyed even if string saver keeps a copy of the referenced string. This caused use-after-free. * The fix ([latest commit](`9776ed44cf`)) updates `objSym.Name` to reference (via `StringRef`) the string saver's copy. Test: 1. For `lld/test/ELF/lto/asmundef.ll`, its test failure is reproducible with `-DLLVM_USE_SANITIZER=Address` and gone with the fix. 3. Run all tests by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild#try-local-changes. * Without the fix, `ELF/lto/asmundef.ll` aborted the multi-stage test at `@@@BUILD_STEP stage2/asan_ubsan check@@@`, defined [here](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh#L30) * With the fix, the [multi-stage test](https://github.com/llvm/llvm-zorg/blob/main/zorg/buildbot/builders/sanitizers/buildbot_fast.sh) pass stage2 {asan, ubsan, masan}. This is also the test used by https://lab.llvm.org/buildbot/#/builders/169 Original commit message `StringMap<T>` creates a [copy of the string](`d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58)`) for entry insertions and intentionally keep copies [since the implementation optimizes string memory usage](`d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)`). On the other hand, linker keeps copies of symbol names [1] in `lld:🧝:parseFiles` [2] before invoking `compileBitcodeFiles` [3]. This change proposes to optimize away string copies inside [LTO::GlobalResolutions](`24e791b416/llvm/include/llvm/LTO/LTO.h (L409)`), which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication. Together with @kazutakahirata's work to make `ComputeCrossModuleImport` more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in `329ba523cc`, the max (as opposed to the sum) of `ComputeCrossModuleImport` or `GlobalResolution` shows up in peak memory usage. * Regarding correctness, the set of [resolved](`80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739)`) [per-module symbols](`80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191)`) is a subset of [llvm::lto::InputFile::Symbols](`80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)`). And bitcode symbol parsing saves symbol name when iterating `obj->symbols` in `BitcodeFile::parse` already. This change updates `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols. * Presumably the undefined symbols in a LTO unit (copied in this patch in linker unique saver) is a small set compared with the set of symbols in global-resolution (copied before this patch), making this a worthwhile trade-off. Benchmarking this change alone shows measurable memory savings across various benchmarks. [1] ELF `1cea5c2138/lld/ELF/InputFiles.cpp (L1748)` [2] `ef7b18a53c/lld/ELF/Driver.cpp (L2863)` [3] `ef7b18a53c/lld/ELF/Driver.cpp (L2995)`	2024-09-09 11:16:58 -07:00
Mingming Liu	1cc4c87198	Revert "[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF" (#107788 ) Reverts llvm/llvm-project#106193 while investigating bot failures https://lab.llvm.org/buildbot/#/builders/169/builds/2989/steps/9/logs/stdio	2024-09-08 16:45:59 -07:00
Mingming Liu	9ade4e2646	[NFCI][LTO][lld] Optimize away symbol copies within LTO global resolution in ELF (#106193 ) `StringMap<T>` creates a [copy of the string](`d4c519e7b2/llvm/include/llvm/ADT/StringMapEntry.h (L55-L58)`) for entry insertions and intentionally keep copies [since the implementation optimizes string memory usage](`d4c519e7b2/llvm/include/llvm/ADT/StringMap.h (L124)`). On the other hand, linker keeps copies of symbol names [1] in `lld:🧝:parseFiles` [2] before invoking `compileBitcodeFiles` [3]. This change proposes to optimize away string copies inside [LTO::GlobalResolutions](`24e791b416/llvm/include/llvm/LTO/LTO.h (L409)`), which will make LTO indexing more memory efficient for ELF. There are similar opportunities for other (COFF, wasm, MachO) formats. The optimization takes place for lld (ELF) only. For the rest of use cases (gold plugin, `llvm-lto2`, etc), LTO owns a string saver to keep copies and use global resolution key for de-duplication. Together with @kazutakahirata's work to make `ComputeCrossModuleImport` more memory efficient, we see a ~20% peak memory usage reduction in a binary where peak memory usage needs to go down. Thanks to the optimization in `329ba523cc`, the max (as opposed to the sum) of `ComputeCrossModuleImport` or `GlobalResolution` shows up in peak memory usage. * Regarding correctness, the set of [resolved](`80c47ad3ae/llvm/lib/LTO/LTO.cpp (L739)`) [per-module symbols](`80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L188-L191)`) is a subset of [llvm::lto::InputFile::Symbols](`80c47ad3ae/llvm/include/llvm/LTO/LTO.h (L120)`). And bitcode symbol parsing saves symbol name when iterating `obj->symbols` in `BitcodeFile::parse` already. This change updates `BitcodeFile::parseLazy` to keep copies of per-module undefined symbols. * Presumably the undefined symbols in a LTO unit (copied in this patch in linker unique saver) is a small set compared with the set of symbols in global-resolution (copied before this patch), making this a worthwhile trade-off. Benchmarking this change alone shows measurable memory savings across various benchmarks. [1] ELF `1cea5c2138/lld/ELF/InputFiles.cpp (L1748)` [2] `ef7b18a53c/lld/ELF/Driver.cpp (L2863)` [3] `ef7b18a53c/lld/ELF/Driver.cpp (L2995)`	2024-09-08 14:52:03 -07:00
Mingming Liu	d4ddf06b0c	[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471 ) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of `64498c5483`). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version	2024-09-06 16:38:17 -07:00
Nick Sarnie	fedc7556ad	[ThinLTO] Don't always print ModulesToCompile debugging information (#106769 ) Nothing went wrong in this case, we just successfully matched a module by identifier. No need to print to std::error like we would for something that should be user-visible. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2024-09-03 07:50:23 -07:00
Kazu Hirata	5c0d61e318	[LTO] Reduce memory usage for import lists (#106772 ) This patch reduces the memory usage for import lists by employing memory-efficient data structures. With this patch, an import list for a given destination module is basically DenseSet<uint32_t> with each element indexing into the deduplication table containing tuples of: {SourceModule, GUID, Definition/Declaration} In one of our large applications, the peak memory usage goes down by 9.2% from 6.120GB to 5.555GB during the LTO indexing step. This patch addresses several sources of space inefficiency associated with std::unordered_map: - std::unordered_map<GUID, ImportKind> takes up 16 bytes because of padding even though ImportKind only carries one bit of information. - std::unordered_map uses pointers to elements, both in the hash table proper and for collision chains. - We allocate an instance of std::unordered_map for each {Destination Module, Source Module} pair for which we have at least one import. Most import lists have less than 10 imports, so the metadata like the size of std::unordered_map and the pointer to the hash table costs a lot relative to the actual contents.	2024-09-01 08:36:06 -07:00
Kazu Hirata	4f15039cf2	[LTO] Introduce new type alias ImportListsTy (NFC) (#106420 ) The background is as follows. I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy, the data structure used for an import list. Once this patch lands, I'm planning to change the type slightly. The new type alias allows us to update the type without touching many places.	2024-08-28 10:42:12 -07:00
Kazu Hirata	dbd7ce0ccd	[IR] Inroduce ModuleToSummariesForIndexTy (NFC) (#105906 ) This patch introduces type alias ModuleToSummariesForIndexTy. I'm planning to change the type slightly to allow heterogeneous lookup (that is, std::map<K, V, std::less<>>) in a subsequent patch. The problem is that changing the type affects many places. Using a type alias reduces the impact.	2024-08-23 17:32:52 -07:00
Kazu Hirata	3563907969	[LTO] Turn ImportMapTy into a proper class (NFC) (#105748 ) This patch turns type alias ImportMapTy into a proper class to provide a more intuitive interface like: ImportList.addDefinition(...) as opposed to: FunctionImporter::addDefinition(ImportList, ...) Also, this patch requires all non-const accesses to go through addDefinition, maybeAddDeclaration, and addGUID while providing const accesses via: const ImportMapTyImpl &getImportMap() const { return ImportMap; } I realize ImportMapTy may not be the best name as a class (maybe OK as a type alias). I am not renaming ImportMapTy in this patch at least because there are 47 mentions of ImportMapTy under llvm/.	2024-08-22 21:56:01 -07:00
Kazu Hirata	5ddc79b093	[LTO] Use a range-based for loop (NFC) (#105467 )	2024-08-21 07:23:30 -07:00
Kazu Hirata	d6d8243dcd	[LTO] Use DenseSet in computeLTOCacheKey (NFC) (#105466 ) The two instances of std::set are used only for membership checking purposes in computeLTOCacheKey. We do not need std::set's strengths like iterators staying valid or the ability to traverse in a sorted order. This patch changes them to DenseSet. While I am at it, this patch replaces count with contains for slightly increased readability.	2024-08-21 07:20:23 -07:00
Kazu Hirata	0f22d47a7a	[LTO] Teach computeLTOCacheKey to return std::string (NFC) (#105331 ) Without this patch, computeLTOCacheKey computes SHA1, creates its hexadecimal representation with toHex, which returns std::string, and then copies it to an output parameter of type SmallString. This patch removes the redirection and teaches computeLTOCacheKey to directly return std::string computed by toHex. With the move semantics, no buffer copy should be involved. While I am at it, this patch adds a Twine to concatenate two strings.	2024-08-20 20:56:47 -07:00
Joseph Huber	615b7eeaa9	Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 )" This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.	2024-07-20 09:29:31 -05:00
NAKAMURA Takumi	5893b1e297	Reformat	2024-07-20 12:36:57 +09:00
NAKAMURA Takumi	740161a9b9	Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 )" This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610	2024-07-20 12:36:57 +09:00
Joseph Huber	c05126bdfc	[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 ) Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially)	2024-07-16 06:22:09 -05:00
Mingming Liu	50fea9943f	Reland "[ThinLTO][Bitcode] Generate import type in bitcode" (#97253 ) https://github.com/llvm/llvm-project/pull/87600 was reverted in order to revert `6262763341`. Now https://github.com/llvm/llvm-project/pull/95482 is fix forward for `6262763341`. This patch is a reland for https://github.com/llvm/llvm-project/pull/87600 Changes on top of original patch In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of `GVSummaryPtrSet` an `unordered_set` which is more memory efficient when the number of elements is smaller than 128 [1] Original commit message For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly. [1] `393eff4e02/llvm/lib/Support/SmallPtrSet.cpp (L43)`	2024-07-08 22:20:33 -07:00
Mingming Liu	af784a5c13	[ThinLTO] Use a set rather than a map to track exported ValueInfos. (#97360 ) https://github.com/llvm/llvm-project/pull/95482 is a reland of https://github.com/llvm/llvm-project/pull/88024. https://github.com/llvm/llvm-project/pull/95482 keeps indexing memory usage reasonable by using unordered_map and doesn't make other changes to originally reviewed code. While discussing possible ways to minimize indexing memory usage, Teresa asked whether I need `ExportSetTy` as a map or a set is sufficient. This PR implements the idea. It uses a set rather than a map to track exposed ValueInfos. Currently, `ExportLists` has two use cases, and neither needs to track a ValueInfo's import/export status. So using a set is sufficient and correct. 1) In both in-process and distributed ThinLTO, it's used to decide if a function or global variable is visible [1] from another module after importing creates additional cross-module references. * If a cross-module call edge is seen today, the callee must be visible to another module without keeping track of its export status already. For instance, this [2] is how callees of direct calls get exported. 2) For in-process ThinLTO [3], it's used to compute lto cache key. * The cache key computation already hashes [4] 'ImportList' , and 'ExportList' is determined by 'ImportList'. So it's fine to not track 'import type' for export list. [1] `66cd8ec4c0/llvm/lib/LTO/LTO.cpp (L1815-L1819)` [2] `66cd8ec4c0/llvm/lib/LTO/LTO.cpp (L1783-L1794)` [3] `66cd8ec4c0/llvm/lib/LTO/LTO.cpp (L1494-L1496)` [4] `b76100e220/llvm/lib/LTO/LTO.cpp (L194-L222)`	2024-07-03 13:15:17 -07:00
Mingming Liu	8d9db947b7	Reland "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" (#95482 ) Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`. Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This is a reland of https://github.com/llvm/llvm-project/pull/92718 * `DenseMap` allocates space for a large number of key/value pairs and wastes space when the number of elements are small. * While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2] when the number of elements is small (for example, 3 or 4 elements). The programmer manual [3] also mentions it could waste space. * Experiments show `FunctionsToImportTy.size()` is smaller than 4 for multiple binaries with high indexing ram usage. `unordered_map` grows factor is at most 2 in llvm libc [4] for insert operations. With this change, `ComputeCrossModuleImport` ram increase is smaller than 0.5G on a couple of binaries with high indexing ram usage. A wider range of (pre-release) tests pass. [1] `ad79a14c9e/llvm/include/llvm/ADT/DenseMap.h (L431-L432)` [2] `ad79a14c9e/llvm/include/llvm/ADT/DenseMap.h (L849)` [3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h [4] `ad79a14c9e/libcxx/include/__hash_table (L1525-L1526)` Original commit message The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195)) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600 will do this.	2024-06-20 10:50:31 -07:00
Nikita Popov	49ae2dcf36	[PassManager] Remove some unnecessary includes (NFC) (#96175 ) SmallPtrSet.h and TimeProfiler.h are unused. CommandLine.h is only needed for the UseNewDbgInfoFormat declare, which can be moved to the places that need it.	2024-06-20 17:41:35 +02:00
Mingming Liu	707f4de428	Revert "Reland "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" (#92718 ) (#94503 ) This reverts commit e33db249b53fb70dce62db3ebd82d42239bd1d9d. The change from set to map increases memory usage, and caused indexing OOM in some applications. Need to profile offline to bring the memory usage down.	2024-06-05 10:06:55 -07:00
Mingming Liu	53061eecdb	Revert "[ThinLTO][Bitcode] Generate import type in bitcode (#87600 )" (#94502 ) This reverts commit 6262763341fcd71a2b0708cf7485f9abd1d26ba8, to prepare for the revert of https://github.com/llvm/llvm-project/pull/92718. https://github.com/llvm/llvm-project/pull/92718 causes LTO indexing OOM in some applications.	2024-06-05 09:59:46 -07:00
Mingming Liu	6262763341	[ThinLTO][Bitcode] Generate import type in bitcode (#87600 ) For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly.	2024-05-22 09:52:54 -07:00
Mingming Liu	e33db249b5	Reland "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" (#92718 ) The original PR is reviewed in https://github.com/llvm/llvm-project/pull/88024, and this PR adds one line (`b9f04d199d`) to fix test Limit to one thread for in-process ThinLTO to test `LLVM_DEBUG` log. - This should fix build bot failure like https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and https://lab.llvm.org/buildbot/#/builders/9/builds/43876 - I could repro the failure and see interleaved log messages by using `-thinlto-threads=all` Original Commit Message: The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195)) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600 will do this.	2024-05-20 08:55:31 -07:00
Mingming Liu	6b0733e3a3	Revert "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" (#92715 ) Reverts llvm/llvm-project#88024 Build bot failures (https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and https://lab.llvm.org/buildbot/#/builders/9/builds/43876)	2024-05-19 22:42:18 -07:00
Mingming Liu	8de7890572	[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option (#88024 ) The goal is to populate `declaration` import status if a new flag`-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases (better call-graph sort and cross-module auto-init) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600 will do this. [1] https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5 [2] https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195	2024-05-19 22:22:47 -07:00
Mingming Liu	d34be649af	[ThinLTO]Sort imported GUIDs before cache key update (#92622 ) Add 'sort' here since it's helpful when container type changes (for example, https://github.com/llvm/llvm-project/pull/88024 wants to change container type from `unordered_set` to `DenseMap) @MaskRay points out `std::` doesn't randomize the iteration order of `unordered_{set,map}`, and the iteration order for single build is deterministic.	2024-05-18 19:39:57 -07:00
Kazu Hirata	d6bf04f476	[LTO] Remove extraneous ArrayRef (NFC) (#90306 ) We don't need to explicitly create these instances of ArrayRef because Hasher::update takes ArrayRef, and ArrayRef can be implicitly constructed from C arrays.	2024-04-26 18:38:15 -07:00
Orlando Cazalet-Hyams	b3f98dff75	[RemoveDIs] Load into new debug info format by default in llvm-lto and llvm-lto2 (#86271 ) Directly load all bitcode into the new debug info format in `llvm-lto` and `llvm-lto2`. This means that new-mode bitcode no longer round-trips back to old-mode after parsing, and that old-mode bitcode gets auto-upgraded to new-mode debug info (which is the current in-memory default in LLVM).	2024-03-22 13:52:11 +00:00
Fangrui Song	a331937197	[MC] Move CompressDebugSections/RelaxELFRelocations from TargetOptions/MCAsmInfo to MCTargetOptions The convention is for such MC-specific options to reside in MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do not follow the convention: `CompressDebugSections` is defined in both TargetOptions and MCAsmInfo and there is forwarding complexity. Move the option to MCTargetOptions and hereby simplify the code. Rename the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc -relax-relocations and llc -x86-relax-relocations can now be unified.	2024-03-06 23:19:59 -08:00
Mehdi Amini	716042a63f	Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702 ) The base class llvm::ThreadPoolInterface will be renamed llvm::ThreadPool in a subsequent commit. This is a breaking change: clients who use to create a ThreadPool must now create a DefaultThreadPool instead.	2024-03-05 18:00:46 -08:00
Jan Svoboda	695b630ae1	[ThinLTO] NFC: Merge duplicated functions together (#82421 )	2024-02-26 09:44:01 -08:00
Mehdi Amini	744616b3ae	Rename `ThreadPool::getThreadCount()` to `getMaxConcurrency()` (NFC) (#82296 ) This is addressing a long-time TODO to rename this misleading API. The old one is preserved for now but marked deprecated.	2024-02-19 18:07:12 -08:00
Kazu Hirata	b7a66d0fae	[llvm] Use SmallString::operator std::string (NFC)	2024-01-19 18:54:11 -08:00
Teresa Johnson	329ba523cc	[LTO][NFC] Free the GlobalResolutions map after final use (#76780 ) The GlobalResolutions map was found to contribute ~9% of the peak memory of a large thin link. However, we are essentially done with it when we are about to compute cross module imports, which itself adds to the peak memory due to the import and export lists (there is one use just after importing but it can easily be moved before importing). Move the last use up above importing, and free the GlobalResolutions map after that (and before importing). To help guard against future inadvertent use after it has been released, change it to a std::optional.	2024-01-03 07:19:56 -08:00
Martin Storsjö	89efffd463	[LTO] [LLD] Don't alias the __imp_func and func symbol resolutions (#71376 ) Commit b963c0b658cc54b370832df4f5a3d63fd69da334 fixed LTO compilation of cases where one translation unit is calling a function with the dllimport attribute, and another translation unit provides this function locally within the same linked module (i.e. not actually dllimported); see https://github.com/llvm/llvm-project/issues/37453 or https://bugs.llvm.org/show_bug.cgi?id=38105 for full context. This was fixed by aliasing their GlobalResolution structs, for the `__imp_` prefixed and non prefixed symbols. I believe this fix to be wrong. This patch reverts that fix, and fixes the same issue differently, within LLD instead. The fix assumed that one can treat the `__imp_` prefixed and unprefixed symbols as equal, referencing SVN r240620 (d766653534e0cff702e42a43b44d3057b6094fea). However that referenced commit had mistaken how this logic works, which was corrected later in SVN r240622 (88e0f9206b4dccb56dee931adab08f89ff80525a); those symbols aren't direct aliases for each other - but if there's a need for the `__imp_` prefixed one and the other one exists, the `__imp_` prefixed one is created, as a pointer to the other one. However this fix only works if both translation units are compiled as LTO; if the caller is compiled as a regular object file and the callee is compiled as LTO, the fix fails, as the LTO compilation doesn't know that the unprefixed symbol is needed. The only level that knows of the potential relationship between the `__imp_` prefixed and unprefixed symbol, across regular and bitcode object files, is LLD itself. Therefore, revert the original fix from b963c0b658cc54b370832df4f5a3d63fd69da334, and fix the issue differently - when concluding that we can fulfill an undefined symbol starting with `__imp_`, mark the corresponding non prefixed symbol as used in a regular object for the LTO compilation, to make sure that this non prefixed symbol exists after the LTO compilation, to let LLD do the fixup of the local import. Extend the testcase to test a regular object file calling an LTO object file, which previously failed. This change also fixes another issue; an object file can provide both unprefixed and prefixed versions of the same symbol, like this: void importedFunc(void) { } void (*__imp_importedFunc)(void) = importedFunc; That allows the function to be called both with and without dllimport markings. (The concept of automatically resolving a reference to `__imp_func` to a locally defined `func` only is done in MSVC style linkers, but not in GNU ld, therefore MinGW mode code often uses this construct.) Previously, the aliasing of global resolutions at the LTO level would trigger a failed assert with "Multiple prevailing defs are not allowed" for this case, as both `importedFunc` and `__imp_importedFunc` could be prevailing. Add a case to the existing LLD test case lto-imp-prefix.ll to test this as well. This change (together with previous change in 3ab6209a3f93bdbeec8e9b9fcc00a9a4980915ff) completes LLD to work with mingw-w64-crt files (the base glue code for a mingw-w64 toolchain) built with LTO.	2023-11-21 15:06:00 +02:00
Youngsuk Kim	876236023c	[llvm] Remove no-op ptr-to-ptr bitcasts (NFC) (#72133 ) Opaque ptr cleanup effort (NFC).	2023-11-13 13:05:27 -05:00
Fangrui Song	2d854dd3e7	Move global namespace cl::opt inside llvm:: or internalize them	2023-10-10 19:58:03 -07:00

1 2 3 4 5 ...

357 Commits