llvm-project

Author	SHA1	Message	Date
Hongtao Yu	950487bddf	[Pseudo Probe] Do not instrument EH blocks. This change avoids inserting probes to EH blocks. Pseudo probe can prevent block merging when probes in the blocks look different. This has a chained effect to passes incurring exponential IR growth (such as jump threading) and as a consequence the compilation may time out. Not inserting probes to EH blocks could mitigate the issue. Another benefit is that both IR size and binary size are smaller. Since EH blocks are usually cold, the change should have minimal impact to profile quality. Testing: Out of two internal large benchmarks, no perf impact seen. 1% size savings to both the `text` and the `pseudo_probe` section. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D142747	2023-01-30 13:26:56 -08:00
Fangrui Song	51b685734b	[Transforms,CodeGen] std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).	2022-12-16 23:21:27 +00:00
Krzysztof Parzyszek	c589730ad5	[YAML] Convert Optional to std::optional	2022-12-06 12:49:32 -08:00
Fangrui Song	89fae41ef1	[IR] llvm::Optional => std::optional Many llvm/IR/* files have been migrated by other contributors. This migrates most remaining files.	2022-12-05 04:13:11 +00:00
Archit Saxena	e170d955fe	Split EH code by default The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of machine function splitter especially in cases where we could do some form of static analysis to split out cold blocks if profile data is absent or profile data which may be faulty (Consider Sample PGO). Of all code that could statically be marked cold Exception handling blocks are one of them (In fact BFI framework also tends to mark them as cold), and the most in size contribution. In my experiments I found out Exception handling pads and all code reachable from there account for up to 6-8% of the .text section on modern production binaries. This patch introduces a flag to split out all Exception handling blocks and blocks only reachable from Exceptional Handling pad to cold section. This flag has shown to give a performance win of up to 0.1% in terms of average cycles and instructions executed on internal facebook search service. Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D131824	2022-08-17 12:40:31 -07:00
ARCHIT SAXENA	3bb1ce2319	Add a nop instruction if a section starts with landing pad for function splitter This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections. Detailed description: The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart. ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 .Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section movq %rax, %rdi <--- first instruction is at offest 0 from LPStart callq _Unwind_Resume@PLT ``` This will cause landing pad entries to become zero (.Ltmp11-foo10.cold) ``` .Lcst_begin4: .uleb128 .Ltmp9-.Lfunc_begin2 # >> Call Site 1 << .uleb128 .Ltmp10-.Ltmp9 # Call between .Ltmp9 and .Ltmp10 .uleb128 .Ltmp11-foo10.cold <---This is zero # jumps to .Ltmp11 .byte 3 # On action: 2 .uleb128 .Ltmp10-.Lfunc_begin2 # >> Call Site 2 << .uleb128 .Lfunc_end9-.Ltmp10 # Call between .Ltmp10 and .Lfunc_end9 .byte 0 # has no landing pad .byte 0 # On action: cleanup .p2align 2 ``` The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output: ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 nop <--- new instruction that is added .Ltmp11: movq %rax, %rdi callq _Unwind_Resume@PLT ``` Reviewed By: modimo, snehasish, rahmanl Differential Revision: https://reviews.llvm.org/D130133	2022-07-22 15:20:10 -07:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
Kazu Hirata	a7938c74f1	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-25 21:42:52 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Snehasish Kumar	3da0aeea08	[NFC] Use hasSection instead of getSection().empty() Use the optimized check hasSection() instead of calling getSection().empty(). Originally suggested in D101004, but was dropped in the commit.	2021-04-23 10:00:38 -07:00
Snehasish Kumar	8077d0ff5c	[CodeGen] Do not split functions with attr "implicit-section-name". The #pragma clang section can be used at a coarse granularity to specify the section used for bss/data/text/rodata for global objects. When split functions is enabled, the function may be split into two parts violating user expectations. Reference: https://clang.llvm.org/docs/LanguageExtensions.html#specifying-section-names-for-global-objects-pragma-clang-section Differential Revision: https://reviews.llvm.org/D101004	2021-04-21 21:51:33 -07:00
Snehasish Kumar	2c7077e67d	[CodeGen] Split out cold exception handling pads. Support for splitting exception handling pads was added in D73739. This change updates the code to split out exception handling pads if profile information indicates that they are cold. For a given function with multiple landind pads, if one of them is hot they are all retained as part of the hot code section. Differential Revision: https://reviews.llvm.org/D96372	2021-02-11 11:23:43 -08:00
Pan, Tao	7af802994e	[CodeGen] Add text section prefix for COFF object file Text section prefix is created in CodeGenPrepare, it's file format independent implementation, text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF. Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix. Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix. Reviewed By: pengfei, rnk Differential Revision: https://reviews.llvm.org/D92073	2020-12-08 18:56:21 +08:00
Snehasish Kumar	24bf6ff4e0	[llvm] Update default cutoff threshold for machine function splitter. Based on internal testing at Google we found that setting the profile summary cutoff threshold to 999950 yields the best results in terms of itlb and icache metrics (as observed on Intel CPUs). default = Split out code if no profile count available for block size-% = The fraction of bytes split out of .text and .text.hot itlb = Misses per kilo instructions (MPKI) for itlb icache = Misses per kilo instructions (MPKI) for L1 icache Search1 \| cutoff \| size-% \| itlb \| icache \| \|---------\|---------\|-----------\|---------\| \| default \| 42.5861 \| 0.0822151 \| 2.46363 \| \| 999999 \| 44.9350 \| 0.0767194 \| 2.44416 \| \| 999950 \| 50.0660 \| 0.075744 \| 2.4091 \| \| 999500 \| 56.9158 \| 0.082564 \| 2.4188 \| \| 995000 \| 63.8625 \| 0.0814927 \| 2.42832 \| \| 990000 \| 71.7314 \| 0.106906 \| 2.57785 \| Search2 \| cutoff \| size-% \| itlb \| icache \| \|---------\|--------\|----------\|---------\| \| default \| 2.8845 \| 0.626712 \| 4.73245 \| \| 999999 \| 3.3291 \| 0.602309 \| 4.70045 \| \| 999950 \| 3.8577 \| 0.587842 \| 4.71632 \| \| 999500 \| 4.4170 \| 0.63577 \| 4.68351 \| \| 995000 \| 5.1020 \| 0.657969 \| 4.82272 \| \| 990000 \| 5.7153 \| 0.719122 \| 5.39496 \| Differential Revision: https://reviews.llvm.org/D89085	2020-10-14 12:48:10 -07:00
Snehasish Kumar	94faadaca4	[llvm][CodeGen] Machine Function Splitter We introduce a codegen optimization pass which splits functions into hot and cold parts. This pass leverages the basic block sections feature recently introduced in LLVM from the Propeller project. The pass targets functions with profile coverage, identifies cold blocks and moves them to a separate section. The linker groups all cold blocks across functions together, decreasing fragmentation and improving icache and itlb utilization. We evaluated the Machine Function Splitter pass on clang bootstrap and SPECInt 2017. For clang bootstrap we observe a mean 2.33% runtime improvement with a ~32% reduction in itlb and stlb misses. Additionally, L1 icache misses reduced by 9.5% while L2 instruction misses reduced by 20%. For SPECInt we report the change in IntRate the C/C++ benchmarks. All benchmarks apart from mcf and x264 improve, on average by 0.6% with the max for deepsjeng at 1.6%. Benchmark % Change 500.perlbench_r 0.78 502.gcc_r 0.82 505.mcf_r -0.30 520.omnetpp_r 0.18 523.xalancbmk_r 0.37 525.x264_r -0.46 531.deepsjeng_r 1.61 541.leela_r 0.83 557.xz_r 0.15 Differential Revision: https://reviews.llvm.org/D85368	2020-08-28 11:10:14 -07:00

20 Commits