llvm-project

Author	SHA1	Message	Date
Nikita Popov	1579e9ca9c	Revert "Run ObjCContractPass in Default Codegen Pipeline (#92331 )" This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2. This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7. Causes major compile-time regressions for unoptimized builds.	2024-05-24 08:14:26 +02:00
Nuri Amari	8cc8e5d6c6	Run ObjCContractPass in Default Codegen Pipeline (#92331 ) Prior to this patch, when using -fthinlto-index= the ObjCARCContractPass isn't run prior to CodeGen, and instruction selection fails on IR containing arc intrinsics. This patch is motivated by that usecase. The pass was previously added in various places codegen is performed. This patch adds the pass to the default codegen pipepline, makes sure it bails immediately if no arc intrinsics are found, and removes the adhoc scheduling of the pass. Co-authored-by: Nuri Amari <nuriamari@fb.com>	2024-05-23 10:04:55 -07:00
Jacob Lambert	11a6799740	[clang][CodeGen] Omit pre-opt link when post-opt is link requested (#85672 ) Currently, when the -relink-builtin-bitcodes-postop option is used we link builtin bitcodes twice: once before optimization, and again after optimization. With this change, we omit the pre-opt linking when the option is set, and we rename the option to the following: -Xclang -mlink-builtin-bitcodes-postopt (-Xclang -mno-link-builtin-bitcodes-postopt) The goal of this change is to reduce compile time. We do lose the theoretical benefits of pre-opt linking, but in practice these are small than the overhead of linking twice. However we may be able to address this in a future patch by adjusting the position of the builtin-bitcode linking pass. Compilations not setting the option are unaffected	2024-05-08 08:11:15 -07:00
Petr Hosek	8bcb073705	[Clang] -fseparate-named-sections option (#91028 ) When set, the compiler will use separate unique sections for global symbols in named special sections (e.g. symbols that are annotated with __attribute__((section(...)))). Doing so enables linker GC to collect unused symbols without having to use a different section per-symbol.	2024-05-07 09:18:55 -07:00
Arthur Eubanks	1b79a34a56	[clang] Add flag to experiment with cold function attributes (#89298 ) To be removed and promoted to a proper driver flag if experiments turn out fruitful. For now, this can be experimented with `-mllvm -pgo-cold-func-opt=[optsize\|minsize\|optnone\|default] -mllvm -enable-pgo-force-function-attrs`. Original LLVM patch for this functionality: #69030	2024-04-19 09:30:47 -07:00
Vitaly Buka	49f0b536fd	[UBSAN] Rename `remove-traps` to `lower-allow-check` (#84853 )	2024-04-04 21:29:46 -07:00
Vitaly Buka	b76eb1ddfb	[clang][CodeGen] Remove SimplifyCFGPass preceding RemoveTrapsPass (#84852 ) There is no performance difference after switching to `llvm.experimental.hot`.	2024-04-04 17:47:16 -07:00
Vitaly Buka	18380c522a	[UBSAN][HWASAN] Remove redundant flags (#87709 ) Presense of `cutoff-hot` or `random-skip-rate` should be enough to trigger optimization.	2024-04-04 14:32:30 -07:00
Vitaly Buka	633bc3bfda	[CodeGen][NFC] Make an opt<> static	2024-04-02 15:47:04 -07:00
Vitaly Buka	e93b5f5a47	[ubsan][NFC] Remove recently added `cl::init(false)` Extracted from #84858	2024-04-01 13:37:42 -07:00
Vitaly Buka	eaa71a97f9	[clang] Add optional pass to remove UBSAN traps using PGO (#84214 ) With #83471 it reduces UBSAN overhead from 44% to 6%. Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks" with PGO build. On real large server binary we see 95% of code is still instrumented, with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only with subset of UBSAN, so base overhead is smaller. We have followup patches to improve it even further.	2024-03-11 12:07:28 -07:00
Fangrui Song	a331937197	[MC] Move CompressDebugSections/RelaxELFRelocations from TargetOptions/MCAsmInfo to MCTargetOptions The convention is for such MC-specific options to reside in MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do not follow the convention: `CompressDebugSections` is defined in both TargetOptions and MCAsmInfo and there is forwarding complexity. Move the option to MCTargetOptions and hereby simplify the code. Rename the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc -relax-relocations and llc -x86-relax-relocations can now be unified.	2024-03-06 23:19:59 -08:00
Paul Kirth	7d8b50aaab	[clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (#83159 ) In addition to being rather hard to follow, there isn't a good reason why FatLTO shouldn't just share the same code for setting module flags for (Thin)LTO. This patch simplifies the logic and makes sure we use set these flags in a consistent way, independent of FatLTO. Additionally, we now test that output in the .llvm.lto section actually matches the output from Full and Thin LTO compilation.	2024-02-28 19:11:55 -08:00
Arthur Eubanks	93cdd1b5cf	[PGO] Add ability to mark cold functions as optsize/minsize/optnone (#69030 ) The performance of cold functions shouldn't matter too much, so if we care about binary sizes, add an option to mark cold functions as optsize/minsize for binary size, or optnone for compile times [1]. Clang patch will be in a future patch. This is intended to replace `shouldOptimizeForSize(Function&, ...)`. We've seen multiple cases where calls to this expensive function, if not careful, can blow up compile times. I will clean up users of that function in a followup patch. Initial version: https://reviews.llvm.org/D149800 [1] https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388	2024-02-12 14:52:08 -08:00
Rahman Lavaee	acec6419e8	[SHT_LLVM_BB_ADDR_MAP] Allow basic-block-sections and labels be used together by decoupling the handling of the two features. (#74128 ) Today `-split-machine-functions` and `-fbasic-block-sections={all,list}` cannot be combined with `-basic-block-sections=labels` (the labels option will be ignored). The inconsistency comes from the way basic block address map -- the underlying mechanism for basic block labels -- encodes basic block addresses (https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html). Specifically, basic block offsets are computed relative to the function begin symbol. This relies on functions being contiguous which is not the case for MFS and basic block section binaries. This means Propeller cannot use binary profiles collected from these binaries, which limits the applicability of Propeller for iterative optimization. To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section binaries, we propose modifying the encoding of this section as follows. First let us review the current encoding which emits the address of each function and its number of basic blocks, followed by basic block entries for each basic block. \| \| \| \|--\|--\| \| Address of the function \| Function Address \| \| Number of basic blocks in this function \| NumBlocks \| \| BB entry 1 \| BB entry 2 \| ... \| BB entry #NumBlocks To make this work for basic block sections, we treat each basic block section similar to a function, except that basic block sections of the same function must be encapsulated in the same structure so we can map all of them to their single function. We modify the encoding to first emit the number of basic block sections (BB ranges) in the function. Then we emit the address map of each basic block section section as before: the base address of the section, its number of blocks, and BB entries for its basic block. The first section in the BB address map is always the function entry section. \| \| \| \|--\|--\| \| Number of sections for this function \| NumBBRanges \| \| Section 1 begin address \| BaseAddress[1] \| \| Number of basic blocks in section 1 \| NumBlocks[1] \| \| BB entries for Section 1 \|..................\| \| Section #NumBBRanges begin address \| BaseAddress[NumBBRanges] \| \| Number of basic blocks in section #NumBBRanges \| NumBlocks[NumBBRanges] \| \| BB entries for Section #NumBBRanges The encoding of basic block entries remains as before with the minor change that each basic block offset is now computed relative to the begin symbol of its containing BB section. This patch adds a new boolean codegen option `-basic-block-address-map`. Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD flag `--lto-basic-block-address-map` are introduced. Analogously, we add a new TargetOption field `BBAddrMap`. This means BB address maps are either generated for all functions in the compiling unit, or for none (depending on `TargetOptions::BBAddrMap`). This patch keeps the functionality of the old `-fbasic-block-sections=labels` option but does not remove it. A subsequent patch will remove the obsolete option. We refactor the `BasicBlockSections` pass by separating the BB address map and BB sections handing to their own functions (named `handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers basic blocks and places them in their assigned sections. `handleBBAddrMap` is invoked after `handleBBSections` (if requested) and only renumbers the blocks. - New tests added: - Two tests basic-block-address-map-with-basic-block-sections.ll and basic-block-address-map-with-mfs.ll to exercise the combination of `-basic-block-address-map` with `-basic-block-sections=list` and '-split-machine-functions`. - A driver sanity test for the `-fbasic-block-address-map` option (basic-block-address-map.c). - An LLD test for testing the `--lto-basic-block-address-map` option. This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`. - Renamed and modified the two existing codegen tests for basic block address map (`basic-block-sections-labels-functions-sections.ll` and `basic-block-sections-labels.ll`) - Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of `SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2 will happen in a separate PR in a few months.	2024-02-01 17:50:46 -08:00
Fangrui Song	36b4a9ccd9	[Driver,CodeGen] Support -mtls-dialect= (#79256 ) GCC supports -mtls-dialect= for several architectures to select TLSDESC. This patch supports the following values * x86: "gnu". "gnu2" (TLSDESC) is not supported yet. * RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915) AArch64 toolchains seem to support TLSDESC from the beginning, and the general dynamic model has poor support. Nobody seems to use the option -mtls-dialect= at all, so we don't bother with it. There also seems very little interest in AArch32's TLSDESC support. TLSDESC does not change IR, but affects object file generation. Without a backend option the option is a no-op for in-process ThinLTO. There seems no motivation to have fine-grained control mixing trad/desc for TLS, so we just pass -mllvm, and don't bother with a modules flag metadata or function attribute. Co-authored-by: Paul Kirth <paulkirth@google.com>	2024-01-26 09:25:38 -08:00
Paul Kirth	9d476e1e1a	[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061 ) Currently, the UnifiedLTO pipeline seems to have trouble with several LTO features, like SplitLTO units, which means we cannot use important optimizations like Whole Program Devirtualization or security hardening instrumentation like CFI. This patch reverts FatLTO to using distinct pipelines for Full LTO and ThinLTO. It still avoids module cloning, since that was error prone.	2024-01-23 14:04:52 -08:00
smanna12	bbe1b06fbb	[NFC][CLANG] Fix static analyzer bugs about unnecessary object copies with auto keyword (#75082 ) Reported by Static Analyzer Tool: In EmitAssemblyHelper::RunOptimizationPipeline(): Using the auto keyword without an & causes the copy of an object of type function. /// List of pass builder callbacks ("CodeGenOptions.h"). std::vector<std::function<void(llvm::PassBuilder &)>> PassBuilderCallbacks;	2023-12-22 20:39:22 -06:00
Paul Kirth	d1e2b96b60	[clang][fatlto] Don't set ThinLTO module flag with FatLTO (#75079 ) Since FatLTO now uses the UnifiedLTO pipeline, we should not set the ThinLTO module flag to true, since it may cause an assertion failure. See https://github.com/llvm/llvm-project/issues/70703 for context.	2023-12-18 13:03:13 -08:00
Zequan Wu	ab3430f891	[Profile] Add binary profile correlation for code coverage. (#69493 ) ## Motivation Since we don't need the metadata sections at runtime, we can somehow offload them from memory at runtime. Initially, I explored [debug info correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113), which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it's be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a platform independent option. ## Design The idea is to use new section names for profile name and data sections and mark them as metadata sections. Under this mode, the new sections are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile. ## Data For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduce from 128M to 68M (46.9%) ``` $ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data [NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table [ = ] 0 +65% +1.23Ki .relro_padding +62% +1.20Ki [ = ] 0 [Unmapped] +13% +448 +19% +448 .init_array +8.8% +192 [ = ] 0 [ELF Section Headers] +0.0% +136 +0.0% +80 [7 Others] +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.5% +80 +1.2% +64 .plt [ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]] +195% +64.0Mi +194% +64.0Mi TOTAL $ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table +13% +448 +19% +448 .init_array +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.2% +64 +1.2% +64 .plt +2.9% +64 [ = ] 0 [ELF Section Headers] +0.0% +40 +0.0% +40 .data +1.2% +32 +1.2% +32 .got.plt +0.0% +24 +0.0% +8 [5 Others] [ = ] 0 -22.9% -872 [LOAD #5 [RW]] -74.5% -1.44Ki [ = ] 0 [Unmapped] [ = ] 0 -76.5% -1.45Ki .relro_padding +118% +38.8Mi +117% +38.8Mi TOTAL ``` A few things to note: 1. llvm-profdata doesn't support filter raw profiles by binary id yet, so when a raw profile doesn't belongs to the binary being digested by llvm-profdata, merging will fail. Once this is implemented, llvm-profdata should be able to only merge raw profiles with the same binary id as the binary and discard the rest (with mismatched/missing binary id). The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging. 2. Note: In COFF, currently they are still loaded into memory but not used. I didn't do it in this patch because I noticed that `.lcovmap` and `.lcovfunc` are loaded into memory. A separate patch will address it. 3. This should works with PGO when value profiling is disabled as debug info correlation currently doing, though I haven't tested this yet.	2023-12-14 14:16:38 -05:00
Mircea Trofin	1d608fc755	[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970 ) Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`. A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.	2023-12-10 18:03:08 -08:00
Paul Kirth	cfe1ece833	[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180 ) https://github.com/llvm/llvm-project/issues/70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO. This is due to an existing issue when cloning modules with labels (see #55991 and #47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway. This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.	2023-11-30 17:09:34 -08:00
Jacob Lambert	8a4b90321f	[CodeGen] Add conditional to module cloning in bitcode linking (#72478 ) Now that we have a commandline option dictating a second link step, (-relink-builtin-bitcode-postop), we can condition the module creation when linking in bitcode modules. This aims to improve performance by avoiding unnecessary linking	2023-11-29 16:44:40 -08:00
Tom Eccles	a207e6307a	[flang] add fveclib flag (#71734 ) -fveclib= allows users to choose a vectorized libm so that loops containing math functions are vectorized. This is implemented as much as possible in the same way as in clang. The driver test in veclib.f90 is copied from the clang test.	2023-11-13 10:04:50 +00:00
Fangrui Song	557b064dbf	Reorder HipStdPar.h after D155856. NFC	2023-11-09 23:10:49 -08:00
Jacob Lambert	c6cf329502	[CodeGen] Implement post-opt linking option for builtin bitocdes (#69371 ) In this patch, we create a new ModulePass that mimics the LinkInModules API from CodeGenAction.cpp, and a new command line option to enable the pass. As part of the implementation, we needed to refactor the BackendConsumer class definition into a new separate header (instead of embedded in CodeGenAction.cpp). With this new pass, we can now re-link bitcodes supplied via the -mlink-built-in bitcodes as part of the RunOptimizationPipeline. With the re-linking pass, we now handle cases where new device library functions are introduced as part of the optimization pipeline. Previously, these newly introduced functions (for example a fused sincos call) would result in a linking error due to a missing function definition. This new pass can be initiated via: -mllvm -relink-builtin-bitcode-postop Also note we intentionally exclude bitcodes supplied via the -mlink-bitcode-file option from the second linking step	2023-11-08 10:53:49 -08:00
William Moses	3c11ac5118	[Clang] Add codegen option to add passbuilder callback functions (#70171 )	2023-11-06 22:26:38 -06:00
Stefan Pintilie	423ad04c67	[PowerPC] Add an alias for -mregnames so that full register names used in assembly. (#70255 ) This option already exists on GCC and so it is being added to LLVM so that we use the same option as them.	2023-11-06 12:30:19 -05:00
Zequan Wu	db7a1ed9a2	Revert "[Profile] Refactor profile correlation. (#70712 )" This reverts commit 4b383d0af93136b80841fc140da0823dfc441dd4.	2023-10-31 10:53:45 -04:00
Zequan Wu	4b383d0af9	[Profile] Refactor profile correlation. (#70712 ) Refactor some code from https://github.com/llvm/llvm-project/pull/69493. Rebase of https://github.com/llvm/llvm-project/pull/69656 on top of main as it was messed up.	2023-10-31 10:41:01 -04:00
Alex Voicu	dd5d65adb6	[HIP][Clang][CodeGen] Add CodeGen support for `hipstdpar` This patch adds the CodeGen changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. This change relaxes restrictions on what gets emitted on the device path, when compiling in `hipstdpar` mode: 1. Unless a function is explicitly marked `__host__`, it will get emitted, whereas before only `__device__` and `__global__` functions would be emitted; 2. Unsupported builtins are ignored as opposed to being marked as an error, as the decision on their validity is deferred to the `hipstdpar` specific code selection pass; 3. We add a `hipstdpar` specific pass to the opt pipeline, independent of optimisation level: - When compiling for the host, iff the user requested it via the `--hipstdpar-interpose-alloc` flag, we add a pass which replaces canonical allocation / deallocation functions with accelerator aware equivalents. A test to validate that unannotated functions get correctly emitted is added as well. Reviewed by: yaxunl, efriedma Differential Revision: https://reviews.llvm.org/D155850	2023-10-17 11:41:36 +01:00
Matheus Izvekov	1ab68d708a	[clang][codegen] Add a verifier IR pass before any further passes. (#68015 ) This helps check clang generated good IR in the first place, as otherwise this can cause UB in subsequent passes, with the final verification pass not catching any issues. This for example would have helped catch https://github.com/llvm/llvm-project/issues/67937 earlier.	2023-10-03 18:05:54 +02:00
Tom Stellard	e7247f1010	[profiling] Move option declarations into headers This will make it possible to add visibility attributes to these variables. This also fixes some type mismatches between the declaration and the definition. Reviewed By: bogner, huangjd Differential Revision: https://reviews.llvm.org/D156599	2023-09-30 18:51:28 -07:00
Arthur Eubanks	a42787d108	[clang] Add -mlarge-data-threshold for x86_64 medium code model (#66839 ) Error if not used with x86_64. Warn if not used with the medium code model (can update if other code models end up using this). Set TargetMachine option and add module flag.	2023-09-26 09:44:31 -07:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Joshua Cranmer	bf49237103	[Clang] Enable -print-pipeline-passes in clang. Reviewed By: arsenm, aeubanks Differential Revision: https://reviews.llvm.org/D127221	2023-09-13 08:57:10 -07:00
Paul T Robinson	a4605af26f	[CodeGen][LTO] Rename some misleading variables (#65185 ) Some flags named "IsLTO" and "IsThinLTO" implied they described compilation modes, but with Unified LTO this is no longer true. Rename these to "PrepForXXX" to be less confusing to readers. Also, deleted "IsThinOrUnifiedLTO" because Unified implies PrepareForThinLTO.	2023-09-05 08:57:16 -07:00
Takuya Shimizu	01b88dd66d	[NFC] Remove unused variables declared in conditions D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch. Differential Revision: https://reviews.llvm.org/D158016	2023-08-30 10:05:06 +09:00
Zequan Wu	f52f8e817e	[Profile] Allow online merging with debug info correlation. When using debug info correlation, value profiling needs to be switched off. So, we are only merging counter sections. In that case the existance of data section is just used to provide an extra check in case of corrupted profile. This patch performs counter merging by iterating the counter section by counter size and add them together. Reviewed By: ellis, MaskRay Differential Revision: https://reviews.llvm.org/D157632	2023-08-28 16:35:41 -04:00
Vitaly Buka	cd591e02d4	Revert "Reland "[Profile] Allow online merging with debug info correlation."" Breaks windows https://lab.llvm.org/buildbot/#/builders/127/builds/54239 This reverts commit d80992032fd0d7da70b2bd1a59066703c3c21fbb.	2023-08-28 13:01:25 -07:00
Zequan Wu	d80992032f	Reland "[Profile] Allow online merging with debug info correlation." Original patch: D157632. Fixed the issue when data is 0 by relaxing the condition. Reviewed By: ellis Differential Revision: https://reviews.llvm.org/D159000	2023-08-28 13:32:55 -04:00
Arthur Eubanks	9b6b6bbc91	Revert "[Profile] Allow online merging with debug info correlation." This reverts commit cf2cf195d5fb0a07c122c5c274bd6deb0790e015. This breaks merging profiles when nothing is instrumented, see comments in https://reviews.llvm.org/D157632. This also reverts follow-up commit bfc965c54fdbfe33a0785b45903b53fc11165f13.	2023-08-22 14:34:46 -07:00
Qiongsi Wu	611ce24114	[PGO] Enable `-fprofile-update` for `-fprofile-generate` Currently, the `-fprofile-udpate` is ignored when `-fprofile-generate` is in effect. This patch enables `-fprofile-update` for `-fprofile-generate`. This patch continues the work from https://reviews.llvm.org/D87737, which added `-fprofile-update` in the first place. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D157280	2023-08-15 10:10:03 -04:00
Zequan Wu	cf2cf195d5	[Profile] Allow online merging with debug info correlation. When using debug info correlation, value profiling needs to be switched off. So, we are only merging counter sections. In that case the existance of data section is just used to provide an extra check in case of corrupted profile. This patch performs counter merging by iterating the counter section by counter size and add them together. Reviewed By: ellis, MaskRay Differential Revision: https://reviews.llvm.org/D157632	2023-08-14 15:15:01 -04:00
Teresa Johnson	65e57bbed0	[FunctionImport] Reduce string duplication (NFC) The import/export maps, and the ModuleToDefinedGVSummaries map, are all indexed by module paths, which are StringRef obtained from the module summary index, which already has a data structure than owns these strings (the ModulePathStringTable). Because these other maps are also StringMap, which makes a copy of the string key, we were keeping multiple extra copies of the module paths, leading to memory overhead. Change these to DenseMap keyed by StringRef, and document that the strings are owned by the index. The only exception is the llvm-link tool which synthesizes an import list from command line options, and I have added a string cache to maintain ownership there. I measured around 5% memory reduction in the thin link of a large binary. Differential Revision: https://reviews.llvm.org/D156580	2023-08-04 14:43:11 -07:00
Paul Kirth	610fc5cbcc	[clang] Preliminary fat-lto-object support Fat LTO objects contain both LTO compatible IR, as well as generated object code. This allows users to defer the choice of whether to use LTO or not to link-time. This is a feature available in GCC for some time, and makes the existing -ffat-lto-objects flag functional in the same way as GCC's. This patch adds support for that flag in the driver, as well as setting the necessary codegen options for the backend. Largely, this means we select the newly added pass pipeline for generating fat objects. Users are expected to pass -ffat-lto-objects to clang in addition to one of the -flto variants. Without the -flto flag, -ffat-lto-objects has no effect. // Compile and link. Use the object code from the fat object w/o LTO. clang -fno-lto -ffat-lto-objects -fuse-ld=lld foo.c // Compile and link. Select full LTO at link time. clang -flto -ffat-lto-objects -fuse-ld=lld foo.c // Compile and link. Select ThinLTO at link time. clang -flto=thin -ffat-lto-objects -fuse-ld=lld foo.c // Compile and link. Use ThinLTO with the UnifiedLTO pipeline. clang -flto=thin -ffat-lto-objects -funified-lto -fuse-ld=lld foo.c // Compile and link. Use full LTO with the UnifiedLTO pipeline. clang -flto -ffat-lto-objects -funified-lto -fuse-ld=lld foo.c // Link separately, using ThinLTO. clang -c -flto=thin -ffat-lto-objects foo.c clang -flto=thin -fuse-ld=lld foo.o -ffat-lto-objects # pass --lto=thin --fat-lto-objects to ld.lld // Link separately, using full LTO. clang -c -flto -ffat-lto-objects foo.c clang -flto -fuse-ld=lld foo.o # pass --lto=full --fat-lto-objects to ld.lld Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977 Depends on D146776 Reviewed By: tejohnson, MaskRay Differential Revision: https://reviews.llvm.org/D146777	2023-07-17 16:26:21 +00:00
Maciej Gabka	5b0e19a7ab	[TLI][AArch64] Add mappings to vectorized functions from ArmPL Arm Performance Libraries contain math library which provides vectorized versions of common math functions. This patch allows to use it with clang and llvm via -fveclib=ArmPL or -vector-library=ArmPL, so loops with such calls can be vectorized. The executable needs to be linked with the amath library. Arm Performance Libraries are available at: https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries Reviewed by: paulwalker-arm Differential Revision: https://reviews.llvm.org/D154508	2023-07-12 12:53:18 +00:00
Matthew Voss	048a0c2469	[clang] Support Unified LTO Bitcode Frontend The unified LTO pipeline creates a single LTO bitcode structure that can be used by Thin or Full LTO. This means that the LTO mode can be chosen at link time and that all LTO bitcode produced by the pipeline is compatible, from an optimization perspective. This makes the behavior of LTO a bit more predictable by normalizing the set of LTO features supported by each LTO bitcode file. Example usage: # Compile and link. Select regular LTO at link time. clang -flto -funified-lto -fuse-ld=lld foo.c # Compile and link. Select ThinLTO at link time. clang -flto=thin -funified-lto -fuse-ld=lld foo.c # Link separately, using ThinLTO. clang -c -flto -funified-lto foo.c # -flto={full,thin} are identical in terms of compilation actions clang -flto=thin -fuse-ld=lld foo.o # pass --lto=thin to ld.lld # Link separately, using regular LTO. clang -c -flto -funified-lto foo.c clang -flto -fuse-ld=lld foo.o # pass --lto=full to ld.lld The RFC discussing the details and rational for this change is here: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774	2023-07-11 15:13:57 -07:00
Teresa Johnson	546ec641b4	Restore "[MemProf] Use new option/pass for profile feedback and matching" This restores commit b4a82b62258c5f650a1cccf5b179933e6bae4867, reverted in 3ab7ef28eebf9019eb3d3c4efd7ebfd160106bb1 because it was thought to cause a bot failure, which ended up being unrelated to this patch set. Differential Revision: https://reviews.llvm.org/D154856	2023-07-11 13:16:20 -07:00
JP Lehr	3ab7ef28ee	Revert "[MemProf] Use new option/pass for profile feedback and matching" This reverts commit b4a82b62258c5f650a1cccf5b179933e6bae4867. Broke AMDGPU OpenMP Offload buildbot	2023-07-11 05:44:42 -04:00

1 2 3 4 5 ...

827 Commits