llvm-project

Author	SHA1	Message	Date
lifengxiang1025	e40cabfea4	[MemProf] Match function's summary and definition strictly (#83665 ) Problem description: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1933468520 Solution: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1934192548 (choose plan2)	2024-03-12 11:00:02 +08:00
Paul Kirth	2fef685363	[llvm][loop-rotate] Allow forcing loop-rotation (#82828 ) Many profitable optimizations cannot be performed at -Oz, due to unrotated loops. While this is worse for size (minimally), many of the optimizations significantly reduce code size, such as memcpy optimizations and other patterns found by loop idiom recognition. Related discussion can be found in issue #50308. This patch adds an experimental, backend-only flag to allow loop header duplication, regardless of the optimization level. Downstream consumers can experiment with this flag, and if it is profitable, we can adjust the compiler's defaults accordingly, and expose any useful frontend flags to opt into the new behavior.	2024-02-29 13:46:13 -08:00
Paul Kirth	777ac46ddb	[llvm] Remove pipeline checks for optsize for DFAJumpThreadingPass The pass itself checks whether to apply the optimization based on the minsize attribute, so there isn't much functional benefit to preventing the pass from being added. Gating the pass gets added to the pass pipeline complicates the interaction with -enable-dfa-jump-thread, as well. Reviewers: aeubanks Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/83318	2024-02-28 11:12:13 -08:00
David Spickett	9c5ca6b0ce	Revert "Enable JumpTableToSwitch pass by default (#82546 )" This reverts commit 1069823ce7d154aa8ef87ae5a0fd34b527eca2a0. This has caused second stage timeouts when building Flang on AArch64: https://lab.llvm.org/buildbot/#/builders/179/builds/9442	2024-02-26 13:35:59 +00:00
Alexander Shaposhnikov	1069823ce7	Enable JumpTableToSwitch pass by default (#82546 ) Enable JumpTableToSwitch pass by default. Test plan: ninja check-all	2024-02-22 11:02:47 -08:00
Arthur Eubanks	93cdd1b5cf	[PGO] Add ability to mark cold functions as optsize/minsize/optnone (#69030 ) The performance of cold functions shouldn't matter too much, so if we care about binary sizes, add an option to mark cold functions as optsize/minsize for binary size, or optnone for compile times [1]. Clang patch will be in a future patch. This is intended to replace `shouldOptimizeForSize(Function&, ...)`. We've seen multiple cases where calls to this expensive function, if not careful, can blow up compile times. I will clean up users of that function in a followup patch. Initial version: https://reviews.llvm.org/D149800 [1] https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388	2024-02-12 14:52:08 -08:00
Alexander Shaposhnikov	d26b43ff4f	Add JumpTableToSwitch pass (#77709 ) Add a pass to convert jump tables to switches. The new pass replaces an indirect call with a switch + direct calls if all the functions in the jump table are smaller than the provided threshold. The pass is currently disabled by default and can be enabled by -enable-jump-table-to-switch. Test plan: ninja check-all	2024-02-10 01:12:46 -08:00
Paul Kirth	9d476e1e1a	[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061 ) Currently, the UnifiedLTO pipeline seems to have trouble with several LTO features, like SplitLTO units, which means we cannot use important optimizations like Whole Program Devirtualization or security hardening instrumentation like CFI. This patch reverts FatLTO to using distinct pipelines for Full LTO and ThinLTO. It still avoids module cloning, since that was error prone.	2024-01-23 14:04:52 -08:00
Mingming Liu	5ce286849a	[CGProfile] Use callee's PGO name when caller->callee is an indirect call. (#78610 ) - With PGO, indirect call edges are constructed using value profiles, and the profile address is mapped to a function's PGO name. The PGO name is computed using a functions linkage before LTO internalization or global promotion. - With ThinLTO, local functions [could be promoted](`2663d2cb9c/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp (L288)`) to have external linkage; and with [full](`2663d2cb9c/llvm/lib/LTO/LTO.cpp (L1328)`) or [thin](`2663d2cb9c/llvm/lib/LTO/LTO.cpp (L448)`) LTO, global functions could be internalized. Edge construction should use a function's PGO name before its linkage is updated.	2024-01-22 10:36:03 -08:00
Mircea Trofin	1d608fc755	[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970 ) Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`. A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.	2023-12-10 18:03:08 -08:00
Paul Kirth	cfe1ece833	[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180 ) https://github.com/llvm/llvm-project/issues/70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO. This is due to an existing issue when cloning modules with labels (see #55991 and #47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway. This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.	2023-11-30 17:09:34 -08:00
Tom Stellard	2750a22745	Passes: Consolidate EnableKnowledgeRetention declarations into a header file (#71695 )	2023-11-13 11:03:49 -08:00
dewen	3b82336188	Revert "[PM] Execute IndVarSimplifyPass precede RessociatePass" (#71617 ) Reverts llvm/llvm-project#71054	2023-11-08 09:22:55 +08:00
dewen	e4d27d7f32	[PM] Execute IndVarSimplifyPass precede RessociatePass (#71054 ) ReassociatePass may clear nsw/nuw flags of some instructions, which may have side effects on optimizations in IndVarSimplifyPass.	2023-11-08 09:21:17 +08:00
Teresa Johnson	87f5e22987	[MemProf] Tolerate missing leaf debug frames (#71233 ) Loosen up the matching so that a missing leaf debug frame in the profile does not prevent matching an allocation context if we can match further up the inlined call context. This relies on the pre-inliner, which was already the default when performing normal PGO feedback along with the MemProf feedback, but to ensure matching is not affected by the presence of PGO, enable the pre-inliner for MemProf feedback as well.	2023-11-03 21:01:07 -07:00
Nikita Popov	a682a9cfd0	Revert "Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235 )" This reverts commit 19b5495b653a00da7a250f48b4f739fcf2bbe82f. PR landed without approval, with severe quality issues.	2023-11-03 21:15:46 +01:00
Manman Ren	19b5495b65	Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235 ) See RFC for details: https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778 We will need to refactor extension to FunctionComparator/FunctionHash to StructuralHash. This patch adds a new pass which is ported from Swift, and will need to discuss on how to migrate Swift’s pass over after we land this in llvm. Create this PR to get some early review on the patch. --------- Co-authored-by: Manman Ren <mren@meta.com>	2023-11-03 11:13:58 -07:00
Amara Emerson	1a2e77cf9e	Revert "Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner."" This reverts commit 86bfeb906e3a95ae428f3e97d78d3d22a7c839f3. This is a long time coming re-application that was originally reverted due to regressions, unrelated to the actual inlining change. These regressions have since been fixed due to another long-in-the-making change of a66051c6 landing. Original commit message for reference: --- We have several situations where it's beneficial for code size to ensure that every call to always-inline functions are inlined before normal inlining decisions are made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this, it only does it on a per-SCC basis, rather than the whole module. Ensuring that all mandatory inlinings are done before any heuristic based decisions are made just makes sense. Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0. This also fixes an exponential compile time blow up in https://github.com/llvm/llvm-project/issues/59126 Differential Revision: https://reviews.llvm.org/D143624 ---	2023-10-28 23:21:11 -07:00
Alex Voicu	0ce6255a50	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-12 11:26:48 +01:00
Alex Voicu	25935c384d	Revert "[HIP][LLVM][Opt] Add LLVM support for `hipstdpar`" This reverts commit c5bba7ea5a05f540948f76a189c880eb24a5e8c6.	2023-10-11 12:27:03 +01:00
Alex Voicu	c5bba7ea5a	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-11 12:22:00 +01:00
Fangrui Song	2d854dd3e7	Move global namespace cl::opt inside llvm:: or internalize them	2023-10-10 19:58:03 -07:00
Alex Voicu	98eda5dda7	Revert "[HIP][LLVM][Opt] Add LLVM support for `hipstdpar`" in order to address build breakage. This reverts commit 9b98ebb0eb43b005921926a622177f10e13b1ac6.	2023-10-10 12:16:10 +01:00
Alex Voicu	9b98ebb0eb	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-10 12:02:05 +01:00
lcvon007	f3c417f341	[Passes] Add option for LoopVersioningLICM pass. (#67107 ) User only can use opt to test LoopVersioningLICM pass, and this PR add the option back(deleted in https://reviews.llvm.org/D137915) so that it's easy for verifying if it is useful for some benchmarks.	2023-09-27 07:38:37 -05:00
Florian Hahn	04f9a8a7d6	[ConstraintElim] Move just before loop simplification pipeline. Adjust the pipeline slightly to move ConstraintElim just before the loop simplification pipeline. This increases the number of cases where SCEV should can preserved in the future. This also enables slightly more opportunities, by benefiting from earlier CFG simplifications, which allow more conditions to be added. Reviewed By: nikic, antoniofrighetto Differential Revision: https://reviews.llvm.org/D158843	2023-09-22 14:31:08 +01:00
Dhruv Chawla	515a826326	[NFC][InferAlignment] Swap extern declaration and definition of EnableInferAlignmentPass This prevents a linker issue when only InstCombine is linked without PassBuilder, like in the case of bugpoint.	2023-09-20 13:07:13 +05:30
Dhruv Chawla	3e992d81af	[InferAlignment] Enable InferAlignment pass by default This gives an improvement of 0.6%: https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u Differential Revision: https://reviews.llvm.org/D158600	2023-09-20 12:08:52 +05:30
Dhruv Chawla	0f152a55d3	[InferAlignment] Implement InferAlignmentPass This pass aims to infer alignment for instructions as a separate pass, to reduce redundant work done by InstCombine running multiple times. It runs late in the pipeline, just before the back-end passes where this information is most useful. Differential Revision: https://reviews.llvm.org/D158529	2023-09-20 12:03:36 +05:30
Nuno Lopes	281ae4903d	[Pipelines] Guard a few more usages of GlobalsAA under the EnableGlobalAnalyses flag	2023-09-07 13:58:28 +01:00
Qiongsi Wu	611ce24114	[PGO] Enable `-fprofile-update` for `-fprofile-generate` Currently, the `-fprofile-udpate` is ignored when `-fprofile-generate` is in effect. This patch enables `-fprofile-update` for `-fprofile-generate`. This patch continues the work from https://reviews.llvm.org/D87737, which added `-fprofile-update` in the first place. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D157280	2023-08-15 10:10:03 -04:00
David Green	05b4310c8a	Revert "[Pipelines] Perform hoisting prior to GVN" This reverts commit 1f37088679a5c2416707d477093950e48148d430 as it causes a large regression in x264, and some other regressions in downstream embedded benchmarks under LTO.	2023-08-08 15:32:24 +01:00
Nikita Popov	1f37088679	[Pipelines] Perform hoisting prior to GVN We currently only enable hoisting in the last SimplifyCFG run of the function simplification pipeline. In particular this happens after GVN, which means that instructions that were identical (and thus hoistable) prior to GVN might no longer be so after it ran, due to equality replacements (see the phase ordering test). The history here is that D84108 restricted hoisting to the very late (module optimization) pipeline only. Then D101468 went back on that, and also performed it at the end of function simplification. This patch goes one step further and allows it prior to GVN. Importantly, we still don't perform hoisting before LoopRotate, which was the original motivation for delaying it. Differential Revision: https://reviews.llvm.org/D156532	2023-08-07 10:06:00 +02:00
Teresa Johnson	546ec641b4	Restore "[MemProf] Use new option/pass for profile feedback and matching" This restores commit b4a82b62258c5f650a1cccf5b179933e6bae4867, reverted in 3ab7ef28eebf9019eb3d3c4efd7ebfd160106bb1 because it was thought to cause a bot failure, which ended up being unrelated to this patch set. Differential Revision: https://reviews.llvm.org/D154856	2023-07-11 13:16:20 -07:00
JP Lehr	3ab7ef28ee	Revert "[MemProf] Use new option/pass for profile feedback and matching" This reverts commit b4a82b62258c5f650a1cccf5b179933e6bae4867. Broke AMDGPU OpenMP Offload buildbot	2023-07-11 05:44:42 -04:00
Teresa Johnson	b4a82b6225	[MemProf] Use new option/pass for profile feedback and matching Previously the MemProf profile was expected to be in the same profile file as a normal PGO profile, passed via the usual -fprofile-use= option, and was matched in the same pass. To simplify profile preparation, since the raw MemProf profile requires the binary for symbolization and may be simpler to index separately from the raw PGO profile, and also to enable providing a MemProf profile for a SamplePGO build, separate out the MemProf feedback option and matching pass. This patch adds the -fmemory-profile-use=${file} option, and the provided file is passed down to LLVM and ultimately used in a new MemProfUsePass which performs the matching of just the memory profile contents of that file. Note that a single profile file containing both normal PGO and MemProf profile data is still supported, and the relevant profile data is matched by the appropriate matching pass(es) based on which option(s) the profile is provided with (the same profile file can be supplied to both feedback options). Differential Revision: https://reviews.llvm.org/D154856	2023-07-10 16:42:56 -07:00
David Sherwood	905083f3c1	[LTO] Ensure LICM hoists expensive fdiv instructions introduced by InstCombine In the LTO pipeline we run InstCombine after LICM, which is different to what we normally do without LTO. This has the effect of undoing all the great work done by LICM to reduce the cost of the loop when it hoists the fdiv out and replaces it with fmul. When InstCombine runs after LICM it puts the fdiv straight back which, on AArch64 at least, is darn expensive. You can observe this problem in the SPEC2017 benchmark parest if you build with "-Ofast -flto" and the loop-vectoriser uses an unroll factor of 1, which is what often happens when tail-folding is enabled. This is also a problem for scalar loops, or indeed any loop where there is only one use of the preheader fdiv result in the loop. See InstCombinerImpl::visitFMul for the code that sinks the fdiv. I've attempted to fix this by adding another LICM pass for Full LTO after InstCombine. The alternative is to stop InstCombine from sinking the fdiv into loops. See D87479 for a previous discussion on this issue. Differential Revision: https://reviews.llvm.org/D143631	2023-07-07 11:06:24 +00:00
Matthew Voss	a1ca3af31e	[llvm] A Unified LTO Bitcode Frontend Here's a high level summary of the changes in this patch. For more information on rational, see the RFC. (https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774). - Add config parameter to LTO backend, specifying which LTO mode is desired when using unified LTO. - Add unified LTO flag to the summary index for efficiency. Unified LTO modules can be detected without parsing the module. - Make sure that the ModuleID is generated by incorporating more types of symbols. Differential Revision: https://reviews.llvm.org/D123803	2023-07-05 14:53:14 -07:00
Paul Kirth	75a1797044	Reland [llvm] Preliminary fat-lto-objects support Fat LTO objects contain both LTO compatible IR, as well as generated object code. This allows users to defer the choice of whether to use LTO or not to link-time. This is a feature available in GCC for some time, and makes the existing -ffat-lto-objects flag functional in the same way as GCC's. Within LLVM, we add a new EmbedBitcodePass that serializes the module to the object file, and expose a new pass pipeline for compiling fat objects. The new pipeline initially clones the module and runs the selected (Thin)LTOPrelink pipeline, after which it will serialize the module into a `.llvm.lto` section of an ELF file. When compiling for (Thin)LTO, this normally the point at which the compiler would emit a object file containing the bitcode and metadata. After that point we compile the original module using the PerModuleDefaultPipeline used for non-LTO compilation. We generate standard object files at the end of this pipeline, which contain machine code and the new `.llvm.lto` section containing bitcode. Since the two pipelines operate on different copies of the module, we can be sure that the bitcode in the `.llvm.lto` section and object code in `.text` are congruent with the existing output produced by the default and LTO pipelines. Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977 Earlier versions of this patch were missing REQUIRES lines for llc related tests in Transforms/EmbedBitcode. Those tests are now under CodeGen/X86, which should avoid running the check on unsupported platforms. The EmbedbBitcodePass also returned PreservedAnalyses::all when adding a metadata section, which failed expensive checks, since it modified the module. This is now corrected. Reviewed By: tejohnson, MaskRay, nikic Differential Revision: https://reviews.llvm.org/D146776	2023-06-28 21:37:50 +00:00
Alex Brachet	6085eb3084	Revert "Reland [llvm] Preliminary fat-lto-objects support" This reverts commit 44265dc3554ef40920b587eeb787a400663af6c7.	2023-06-24 01:15:50 +00:00
Teresa Johnson	200cc952a2	[LTO][GlobalDCE] Use pass parameter instead of module flag for LTO phase D63932 added a module flag to indicate that we are executing the regular LTO post merge pipeline, so that GlobalDCE could perform more aggressive optimization for Dead Virtual Function Elimination. This caused issues trying to reuse bitcode that had already been through the LTO pipeline (see context in D139816). Instead support this by passing down a parameter flag to the GlobalDCEPass constructor, which is the more usual way for indicating this information. Most test changes are to remove incidental uses of this flag. Of the 2 real uses, llvm/test/LTO/ARM/lto-linking-metadata.ll is now obsolete and removed in this patch, and the virtual-functions-visibility-post-lto.ll test is updated to use the regular LTO default pipeline where this parameter is set to true. Differential Revision: https://reviews.llvm.org/D153655	2023-06-23 17:05:07 -07:00
Paul Kirth	44265dc355	Reland [llvm] Preliminary fat-lto-objects support Fat LTO objects contain both LTO compatible IR, as well as generated object code. This allows users to defer the choice of whether to use LTO or not to link-time. This is a feature available in GCC for some time, and makes the existing -ffat-lto-objects flag functional in the same way as GCC's. Within LLVM, we add a new EmbedBitcodePass that serializes the module to the object file, and expose a new pass pipeline for compiling fat objects. The new pipeline initially clones the module and runs the selected (Thin)LTOPrelink pipeline, after which it will serialize the module into a `.llvm.lto` section of an ELF file. When compiling for (Thin)LTO, this normally the point at which the compiler would emit a object file containing the bitcode and metadata. After that point we compile the original module using the PerModuleDefaultPipeline used for non-LTO compilation. We generate standard object files at the end of this pipeline, which contain machine code and the new `.llvm.lto` section containing bitcode. Since the two pipelines operate on different copies of the module, we can be sure that the bitcode in the `.llvm.lto` section and object code in `.text` are congruent with the existing output produced by the default and LTO pipelines. Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977 Earlier versions of this patch were missing REQUIRES lines for llc related tests in Transforms/EmbedBitcode. Those tests are now under CodeGen/X86, which should avoid running the check on unsupported platforms. Reviewed By: tejohnson, MaskRay, nikic Differential Revision: https://reviews.llvm.org/D146776	2023-06-23 23:23:58 +00:00
Paul Kirth	a3800ad9d8	Revert "[llvm] Preliminary fat-lto-objects support" There seems to be a problem on arm buildbots. Reverting until I can investigate. https://lab.llvm.org/buildbot#builders/245/builds/10184 This reverts commit a67208e1c697649ce432e6497f56a93675273dd8 and dependent commit e54a3112cee5ae0a9117359ecbea878e1388f51e.	2023-06-23 18:43:41 +00:00
Paul Kirth	a67208e1c6	[llvm] Preliminary fat-lto-objects support Fat LTO objects contain both LTO compatible IR, as well as generated object code. This allows users to defer the choice of whether to use LTO or not to link-time. This is a feature available in GCC for some time, and makes the existing -ffat-lto-objects flag functional in the same way as GCC's. Within LLVM, we add a new EmbedBitcodePass that serializes the module to the object file, and expose a new pass pipeline for compiling fat objects. The new pipeline initially clones the module and runs the selected (Thin)LTOPrelink pipeline, after which it will serialize the module into a `.llvm.lto` section of an ELF file. When compiling for (Thin)LTO, this normally the point at which the compiler would emit a object file containing the bitcode and metadata. After that point we compile the original module using the PerModuleDefaultPipeline used for non-LTO compilation. We generate standard object files at the end of this pipeline, which contain machine code and the new `.llvm.lto` section containing bitcode. Since the two pipelines operate on different copies of the module, we can be sure that the bitcode in the `.llvm.lto` section and object code in `.text` are congruent with the existing output produced by the default and LTO pipelines. Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977 Reviewed By: tejohnson, MaskRay, nikic Differential Revision: https://reviews.llvm.org/D146776	2023-06-23 17:51:30 +00:00
Teresa Johnson	f354e971b0	[MemProf] Clean up MemProf instrumentation pass invocation First, removes the invocation of the memprof instrumentation passes from the end of the module simplification pass builder, where it doesn't really belong. However, it turns out that this was never being invoked, as it is guarded by an internal option not used anywhere (even tests). These passes are actually added via clang under the -fmemory-profile option. Changed this to add via the EP callback interface, similar to the sanitizer passes. They are added to the EP for the end of the optimization pipeline, which is roughly where they were being added already (end of the pre-LTO link pipelines and non-LTO optimization pipeline). Ideally we should plumb the output file through to LLVM and set it up there, so I have added a TODO. Differential Revision: https://reviews.llvm.org/D151593	2023-05-26 17:38:49 -07:00
Arthur Eubanks	13e3d4aa5a	[Pipeline] Don't run EarlyFPM in LTO post link EarlyFPM cleans up the output of the frontend. This isn't necessary in post link pipelines as the pre link pipeline already ran this. ~0.4% savings in ThinLTO builds: https://llvm-compile-time-tracker.com/compare.php?from=8a5d4eb775c644d8683f24817d44c510d2b853b7&to=3580252a2162eadca0da99f1eeaa112f74a0353d&stat=instructions:u Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D145403	2023-05-25 09:32:54 -07:00
Nikita Popov	3060ee0c6a	[Pipelines] Don't skip GlobalDCE in ThinLTO pre-link GlobalDCE will only remove functions with available externally linkage if they are unreferenced. As such, I don't believe there is any problem with running this pass as part of the ThinLTO pre-link pipeline. It will only remove functions that are completely dead in that module, and I don't think there is any benefit to keeping them around for the post-link phase. There is no compile-time impact from the additional pass. This is a followup to one of the side discussions in D146776. Differential Revision: https://reviews.llvm.org/D149446	2023-05-15 14:58:24 +02:00
Teresa Johnson	cfad2d3a3d	[MemProf] Context disambiguation cloning pass [patch 4/4] Applies ThinLTO cloning decisions made during the thin link and recorded in the summary index to the IR during the ThinLTO backend. Depends on D141077. Differential Revision: https://reviews.llvm.org/D149117	2023-05-05 16:26:32 -07:00
Shoaib Meenai	141be5c062	Revert "Reland [Pipeline] Don't limit ArgumentPromotion to -O3" This reverts commit 6f29d1adf29820daae9ea7a01ae2588b67735b9e. https://reviews.llvm.org/D149768 is causing size regressions for -Oz with FullLTO, and I'm reverting that one while investigating. This commit depends on that one, so it needs to be reverted as well.	2023-05-05 14:26:57 -07:00
Arthur Eubanks	6f29d1adf2	Reland [Pipeline] Don't limit ArgumentPromotion to -O3 This is a cheap pass so there's no need to limit to -O3. This removes some differences between various pipelines. Code size regressions should be addressed with https://reviews.llvm.org/D149768. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D148269	2023-05-03 13:17:30 -07:00

1 2 3 4

170 Commits