845 Commits

Author SHA1 Message Date
Kazu Hirata
639a0afa99 Revert "Deprecate the -fbasic-block-sections=labels option. (#107494)"
This reverts commit 1911a50fae8a441b445eb835b98950710d28fc88.

Several bots are failing:

https://lab.llvm.org/buildbot/#/builders/190/builds/6519
https://lab.llvm.org/buildbot/#/builders/3/builds/5248
https://lab.llvm.org/buildbot/#/builders/18/builds/4463
2024-09-25 12:34:43 -07:00
Rahman Lavaee
1911a50fae
Deprecate the -fbasic-block-sections=labels option. (#107494)
This feature is supported via the newer option
`-fbasic-block-address-map`. Using the old option still works by
delegating to the newer option, while a warning is printed to show
deprecation.
2024-09-25 12:03:38 -07:00
Fangrui Song
4a9da96dc6
[clang] Add cc1 --output-asm-variant= to set output syntax
2fcaa549a824efeb56e807fcf750a56bf985296b (2010) added cc1as option
`-output-asm-variant` (untested) to set the output syntax.
`clang -cc1as -filetype asm -output-asm-variant 1` allows AT&T input and
Intel output (`AssemblerDialect` is also used by non-x86 targets).

This patch renames the cc1as option (to avoid collision with -o) and
makes it available for cc1 to set output syntax. This allows different
input & output syntax:

```
echo 'asm("mov $1, %eax");' | clang -xc - -S -o - -Xclang --output-asm-variant=1
```

Note: `AsmWriterFlavor` (with a misleading name), used to initialize
MCAsmInfo::AssemblerDialect, is primarily used for assembly input, not
for output.
Therefore,
`echo 'asm("mov $1, %eax");' | clang -x c - -mllvm --x86-asm-syntax=intel -S -o -`,
which achieves a similar goal before Clang 19, was unintended.

Close #109157

Pull Request: https://github.com/llvm/llvm-project/pull/109360
2024-09-24 15:59:33 -07:00
Youngsuk Kim
7db641af13 [clang] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered
2024-09-19 17:18:10 -05:00
nebulark
f5ba3e1fa6
[CodeView] Flatten cmd args in frontend for LF_BUILDINFO (#106369) 2024-09-16 19:29:42 +02:00
Antonio Frighetto
2ae968a0d9
[Instrumentation] Move out to Utils (NFC) (#108532)
Utility functions have been moved out to Utils. Minor opportunity to
drop the header where not needed.
2024-09-15 21:07:40 -07:00
Kazu Hirata
5c0d61e318
[LTO] Reduce memory usage for import lists (#106772)
This patch reduces the memory usage for import lists by employing
memory-efficient data structures.

With this patch, an import list for a given destination module is
basically DenseSet<uint32_t> with each element indexing into the
deduplication table containing tuples of:

  {SourceModule, GUID, Definition/Declaration}

In one of our large applications, the peak memory usage goes down by
9.2% from 6.120GB to 5.555GB during the LTO indexing step.

This patch addresses several sources of space inefficiency associated
with std::unordered_map:

- std::unordered_map<GUID, ImportKind> takes up 16 bytes because of
  padding even though ImportKind only carries one bit of information.

- std::unordered_map uses pointers to elements, both in the hash table
  proper and for collision chains.

- We allocate an instance of std::unordered_map for each
  {Destination Module, Source Module} pair for which we have at least
  one import.  Most import lists have less than 10 imports, so the
  metadata like the size of std::unordered_map and the pointer to the
  hash table costs a lot relative to the actual contents.
2024-09-01 08:36:06 -07:00
Tarun Prabhu
a1441ca747
[flang][Driver] Add support for -mllvm -print-pipeline-passes
The behavior deliberately mimics that of clang. Ideally, -print-pipeline-passes
should be a first-class driver option. Notes to this effect have been added in 
the appropriate places in both flang and clang.

---------

Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
2024-08-29 16:21:43 -06:00
Chris Apple
f77e8f765e
[clang][rtsan] Reland realtime sanitizer codegen and driver (#102622)
This reverts commit a1e9b7e646b76bf844e8a9a101ebd27de11992ff
This relands commit d010ec6af8162a8ae4e42d2cac5282f83db0ce07

No modifications from the original patch. It was determined that the
ubsan build failure was happening even after the revert, some examples:

https://lab.llvm.org/buildbot/#/builders/159/builds/4477 
https://lab.llvm.org/buildbot/#/builders/159/builds/4478 
https://lab.llvm.org/buildbot/#/builders/159/builds/4479
2024-08-23 08:16:52 -07:00
Chris Apple
a1e9b7e646
Revert "[clang][rtsan] Introduce realtime sanitizer codegen and drive… (#105744)
…r (#102622)"

This reverts commit d010ec6af8162a8ae4e42d2cac5282f83db0ce07.

Build failure: https://lab.llvm.org/buildbot/#/builders/159/builds/4466
2024-08-22 15:19:41 -07:00
Chris Apple
d010ec6af8
[clang][rtsan] Introduce realtime sanitizer codegen and driver (#102622)
Introduce the `-fsanitize=realtime` flag in clang driver

Plug in the RealtimeSanitizer PassManager pass in Codegen, and attribute
a function based on if it has the `[[clang::nonblocking]]` function
effect.
2024-08-22 14:08:24 -07:00
Fangrui Song
eb549da9e5
[Driver] Add -Wa, options -mmapsyms={default,implicit}
-Wa,-mmapsyms=implicit enables the alternative mapping symbol scheme
discussed at #99718.

While not conforming to the current aaelf64 ABI, the option is
invaluable for those with full control over their toolchain, no reliance
on weird relocatable files, and a strong focus on minimizing both
relocatable and executable sizes.

The option is discouraged when portability of the relocatable objects is
a concern.
https://maskray.me/blog/2024-07-21-mapping-symbols-rethinking-for-efficiency
elaborates the risk.

Pull Request: https://github.com/llvm/llvm-project/pull/104542
2024-08-22 09:20:53 -07:00
Fangrui Song
3333ec1183 [Driver] Make CodeGenOptions name match MCTargetOptions names
* Initialize `X86RelaxRelocations`.
* Fix #96860 test to actually test -Wa,-msse2avx for non-x86.
2024-08-15 15:52:20 -07:00
Peter Rong
74e4694b8c
[LTO] enable ObjCARCContractPass only on optimized build (#101114)
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a
regression on O0 builds and was reverted.
This patch trys to bring that back by:

1. reverts the
[revert](1579e9ca9c).
2. `createObjCARCContractPass` only on optimized builds.

Tests are updated to refelect the changes. Specifically, all `O0` tests
should not include `ObjCARCContractPass`

Signed-off-by: Peter Rong <PeterRong@meta.com>
2024-08-09 13:04:25 -07:00
Fangrui Song
04a1a3482c
[Driver] Add -Wa, options --crel and --allow-experimental-crel
The two options are discussed in a few comments around
https://github.com/llvm/llvm-project/pull/91280#issuecomment-2099344079

* -Wa,--crel: error "-Wa,--allow-experimental-crel must be specified to use -Wa,--crel..."
* -Wa,--allow-experimental-crel: no-op
* -Wa,--crel,--allow-experimental-crel: enable CREL in the integrated assembler (#91280)

MIPS's little-endian n64 ABI messed up the `r_info` field in
relocations. While this could be fixed with CREL, my intention is to
avoid complication in assembler/linker. The implementation simply
doesn't allow CREL for MIPS.

Link: https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600

Pull Request: https://github.com/llvm/llvm-project/pull/97378
2024-07-03 13:45:48 -07:00
Alexander Shaposhnikov
3402a1a4d2
[Clang] Enable nsan instrumentation pass (#97359)
Enable nsan instrumentation pass
2024-07-02 16:57:30 -07:00
Jacob Lambert
2264544e2d
[clang][CodeGen] Remove unnecessary ShouldLinkFiles conditional (#96951)
We have reworked the bitcode linking option to no longer link twice if
post-optimization linking is requested. As such, we no longer need to
conditionally link bitcodes supplied via -mlink-bitcode-file, as there
is no danger of linking them twice
2024-06-28 14:35:29 -07:00
Egor Pasko
cab81dd038
[EntryExitInstrumenter] Move passes out of clang into LLVM default pipelines (#92171)
Move EntryExitInstrumenter(PostInlining=true) to as late as possible and
EntryExitInstrumenter(PostInlining=false) to an early pre-inlining stage
(but skip for ThinLTO post-link).

This should fix the issues reported in
https://github.com/rust-lang/rust/issues/92109 and
https://github.com/llvm/llvm-project/issues/52853. These are caused
by https://reviews.llvm.org/D97608.
2024-05-31 12:48:45 -07:00
Nikita Popov
1579e9ca9c Revert "Run ObjCContractPass in Default Codegen Pipeline (#92331)"
This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2.
This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7.

Causes major compile-time regressions for unoptimized builds.
2024-05-24 08:14:26 +02:00
Nuri Amari
8cc8e5d6c6
Run ObjCContractPass in Default Codegen Pipeline (#92331)
Prior to this patch, when using -fthinlto-index= the ObjCARCContractPass isn't run prior to CodeGen, and instruction selection fails on IR containing arc intrinsics. This patch is motivated by that usecase.

The pass was previously added in various places codegen is performed. This patch adds the pass to the default codegen pipepline, makes sure it bails immediately if no arc intrinsics are found, and removes the adhoc scheduling of the pass. 

Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-05-23 10:04:55 -07:00
Jacob Lambert
11a6799740
[clang][CodeGen] Omit pre-opt link when post-opt is link requested (#85672)
Currently, when the -relink-builtin-bitcodes-postop option is used we
link builtin bitcodes twice: once before optimization, and again after
optimization.

With this change, we omit the pre-opt linking when the option is set,
and we rename the option to the following:

  -Xclang -mlink-builtin-bitcodes-postopt
 (-Xclang -mno-link-builtin-bitcodes-postopt) 

The goal of this change is to reduce compile time. We do lose the
theoretical benefits of pre-opt linking, but in practice these are small
than the overhead of linking twice. However we may be able to address
this in a future patch by adjusting the position of the builtin-bitcode
linking pass.

Compilations not setting the option are unaffected
2024-05-08 08:11:15 -07:00
Petr Hosek
8bcb073705
[Clang] -fseparate-named-sections option (#91028)
When set, the compiler will use separate unique sections for global
symbols in named special sections (e.g. symbols that are annotated with
__attribute__((section(...)))). Doing so enables linker GC to collect
unused symbols without having to use a different section per-symbol.
2024-05-07 09:18:55 -07:00
Arthur Eubanks
1b79a34a56
[clang] Add flag to experiment with cold function attributes (#89298)
To be removed and promoted to a proper driver flag if experiments turn
out fruitful.

For now, this can be experimented with `-mllvm
-pgo-cold-func-opt=[optsize|minsize|optnone|default] -mllvm
-enable-pgo-force-function-attrs`.

Original LLVM patch for this functionality: #69030
2024-04-19 09:30:47 -07:00
Vitaly Buka
49f0b536fd
[UBSAN] Rename remove-traps to lower-allow-check (#84853) 2024-04-04 21:29:46 -07:00
Vitaly Buka
b76eb1ddfb
[clang][CodeGen] Remove SimplifyCFGPass preceding RemoveTrapsPass (#84852)
There is no performance difference after switching to
`llvm.experimental.hot`.
2024-04-04 17:47:16 -07:00
Vitaly Buka
18380c522a
[UBSAN][HWASAN] Remove redundant flags (#87709)
Presense of `cutoff-hot` or `random-skip-rate`
should be enough to trigger optimization.
2024-04-04 14:32:30 -07:00
Vitaly Buka
633bc3bfda [CodeGen][NFC] Make an opt<> static 2024-04-02 15:47:04 -07:00
Vitaly Buka
e93b5f5a47 [ubsan][NFC] Remove recently added cl::init(false)
Extracted from #84858
2024-04-01 13:37:42 -07:00
Vitaly Buka
eaa71a97f9
[clang] Add optional pass to remove UBSAN traps using PGO (#84214)
With #83471 it reduces UBSAN overhead from 44% to 6%.
Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks"
with PGO build.

On real large server binary we see 95% of code is still instrumented,
with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only
with subset of UBSAN, so base overhead is smaller.

We have followup patches to improve it even further.
2024-03-11 12:07:28 -07:00
Fangrui Song
a331937197 [MC] Move CompressDebugSections/RelaxELFRelocations from TargetOptions/MCAsmInfo to MCTargetOptions
The convention is for such MC-specific options to reside in
MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do
not follow the convention: `CompressDebugSections` is defined in both
TargetOptions and MCAsmInfo and there is forwarding complexity.

Move the option to MCTargetOptions and hereby simplify the code. Rename
the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc
-relax-relocations and llc -x86-relax-relocations can now be unified.
2024-03-06 23:19:59 -08:00
Paul Kirth
7d8b50aaab
[clang][fat-lto-objects] Make module flags match non-FatLTO pipelines (#83159)
In addition to being rather hard to follow, there isn't a good reason
why FatLTO shouldn't just share the same code for setting module flags
for (Thin)LTO. This patch simplifies the logic and makes sure we use set
these flags in a consistent way, independent of FatLTO.

Additionally, we now test that output in the .llvm.lto section actually
matches the output from Full and Thin LTO compilation.
2024-02-28 19:11:55 -08:00
Arthur Eubanks
93cdd1b5cf
[PGO] Add ability to mark cold functions as optsize/minsize/optnone (#69030)
The performance of cold functions shouldn't matter too much, so if we
care about binary sizes, add an option to mark cold functions as
optsize/minsize for binary size, or optnone for compile times [1]. Clang
patch will be in a future patch.

This is intended to replace `shouldOptimizeForSize(Function&, ...)`.
We've seen multiple cases where calls to this expensive function, if not
careful, can blow up compile times. I will clean up users of that
function in a followup patch.

Initial version: https://reviews.llvm.org/D149800

[1]
https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388
2024-02-12 14:52:08 -08:00
Rahman Lavaee
acec6419e8
[SHT_LLVM_BB_ADDR_MAP] Allow basic-block-sections and labels be used together by decoupling the handling of the two features. (#74128)
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
    
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.

First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.

| | |
|--|--|
| Address of the function | Function Address |
|  Number of basic blocks in this function | NumBlocks |
|  BB entry 1
|  BB entry 2
|   ...
|  BB entry #NumBlocks
    
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
    
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
|  Number of sections for this function   | NumBBRanges |
| Section 1 begin address                     | BaseAddress[1]  |
| Number of basic blocks in section 1 | NumBlocks[1]    |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
    
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
    
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
    
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.

We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
  - New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
2024-02-01 17:50:46 -08:00
Fangrui Song
36b4a9ccd9
[Driver,CodeGen] Support -mtls-dialect= (#79256)
GCC supports -mtls-dialect= for several architectures to select TLSDESC.
This patch supports the following values

* x86: "gnu". "gnu2" (TLSDESC) is not supported yet.
* RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915)

AArch64 toolchains seem to support TLSDESC from the beginning, and the
general dynamic model has poor support. Nobody seems to use the option
-mtls-dialect= at all, so we don't bother with it.
There also seems very little interest in AArch32's TLSDESC support.

TLSDESC does not change IR, but affects object file generation. Without
a backend option the option is a no-op for in-process ThinLTO.

There seems no motivation to have fine-grained control mixing trad/desc
for TLS, so we just pass -mllvm, and don't bother with a modules flag
metadata or function attribute.

Co-authored-by: Paul Kirth <paulkirth@google.com>
2024-01-26 09:25:38 -08:00
Paul Kirth
9d476e1e1a
[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061)
Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.

This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.
2024-01-23 14:04:52 -08:00
smanna12
bbe1b06fbb
[NFC][CLANG] Fix static analyzer bugs about unnecessary object copies with auto keyword (#75082)
Reported by Static Analyzer Tool:

In ​EmitAssemblyHelper::​RunOptimizationPipeline(): Using the auto
keyword without an & causes the copy of an object of type function.

 /// List of pass builder callbacks ("CodeGenOptions.h").
std::vector<std::function<void(llvm::PassBuilder &)>>
PassBuilderCallbacks;
2023-12-22 20:39:22 -06:00
Paul Kirth
d1e2b96b60
[clang][fatlto] Don't set ThinLTO module flag with FatLTO (#75079)
Since FatLTO now uses the UnifiedLTO pipeline, we should not set the
ThinLTO module flag to true, since it may cause an assertion failure.
See https://github.com/llvm/llvm-project/issues/70703 for context.
2023-12-18 13:03:13 -08:00
Zequan Wu
ab3430f891
[Profile] Add binary profile correlation for code coverage. (#69493)
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.

## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.

## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +14.6Mi  [NEW] +14.6Mi    __llvm_prf_data
  [NEW] +10.6Mi  [NEW] +10.6Mi    __llvm_prf_names
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
  [ = ]       0   +65% +1.23Ki    .relro_padding
   +62% +1.20Ki  [ = ]       0    [Unmapped]
   +13%    +448   +19%    +448    .init_array
  +8.8%    +192  [ = ]       0    [ELF Section Headers]
  +0.0%    +136  +0.0%     +80    [7 Others]
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.5%     +80  +1.2%     +64    .plt
  [ = ]       0 -99.2% -3.68Ki    [LOAD #5 [RW]]
  +195% +64.0Mi  +194% +64.0Mi    TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
   +13%    +448   +19%    +448    .init_array
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.2%     +64  +1.2%     +64    .plt
  +2.9%     +64  [ = ]       0    [ELF Section Headers]
  +0.0%     +40  +0.0%     +40    .data
  +1.2%     +32  +1.2%     +32    .got.plt
  +0.0%     +24  +0.0%      +8    [5 Others]
  [ = ]       0 -22.9%    -872    [LOAD #5 [RW]]
 -74.5% -1.44Ki  [ = ]       0    [Unmapped]
  [ = ]       0 -76.5% -1.45Ki    .relro_padding
  +118% +38.8Mi  +117% +38.8Mi    TOTAL
```

A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
2023-12-14 14:16:38 -05:00
Mircea Trofin
1d608fc755
[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970)
Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`.

A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.
2023-12-10 18:03:08 -08:00
Paul Kirth
cfe1ece833
[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180)
https://github.com/llvm/llvm-project/issues/70703 pointed out that
cloning LLVM modules could lead to miscompiles when using FatLTO.

This is due to an existing issue when cloning modules with labels (see
#55991 and #47769). Since this can lead to miscompilation, we can avoid
cloning the LLVM modules, which was desirable anyway.

This patch modifies the EmbedBitcodePass to no longer clone the module
or run an input pipeline over it. Further, it make FatLTO always perform
UnifiedLTO, so we can still defer the Thin/Full LTO decision to
link-time. Lastly, it removes dead/obsolete code related to now defunct
options that do not work with the EmbedBitcodePass implementation any
longer.
2023-11-30 17:09:34 -08:00
Jacob Lambert
8a4b90321f
[CodeGen] Add conditional to module cloning in bitcode linking (#72478)
Now that we have a commandline option dictating a second link step,
(-relink-builtin-bitcode-postop), we can condition the module creation
when linking in bitcode modules. This aims to improve performance by
avoiding unnecessary linking
2023-11-29 16:44:40 -08:00
Tom Eccles
a207e6307a
[flang] add fveclib flag (#71734)
-fveclib= allows users to choose a vectorized libm so that loops
containing math functions are vectorized.

This is implemented as much as possible in the same way as in clang. The
driver test in veclib.f90 is copied from the clang test.
2023-11-13 10:04:50 +00:00
Fangrui Song
557b064dbf Reorder HipStdPar.h after D155856. NFC 2023-11-09 23:10:49 -08:00
Jacob Lambert
c6cf329502
[CodeGen] Implement post-opt linking option for builtin bitocdes (#69371)
In this patch, we create a new ModulePass that mimics the LinkInModules
API from CodeGenAction.cpp, and a new command line option to enable the
pass. As part of the implementation, we needed to refactor the
BackendConsumer class definition into a new separate header (instead of
embedded in CodeGenAction.cpp). With this new pass, we can now re-link
bitcodes supplied via the -mlink-built-in bitcodes as part of the
RunOptimizationPipeline.

With the re-linking pass, we now handle cases where new device library
functions are introduced as part of the optimization pipeline.
Previously, these newly introduced functions (for example a fused sincos
call) would result in a linking error due to a missing function
definition. This new pass can be initiated via:

      -mllvm -relink-builtin-bitcode-postop

Also note we intentionally exclude bitcodes supplied via the
-mlink-bitcode-file option from the second linking step
2023-11-08 10:53:49 -08:00
William Moses
3c11ac5118
[Clang] Add codegen option to add passbuilder callback functions (#70171) 2023-11-06 22:26:38 -06:00
Stefan Pintilie
423ad04c67
[PowerPC] Add an alias for -mregnames so that full register names used in assembly. (#70255)
This option already exists on GCC and so it is being added to LLVM so
that we use the same option as them.
2023-11-06 12:30:19 -05:00
Zequan Wu
db7a1ed9a2 Revert "[Profile] Refactor profile correlation. (#70712)"
This reverts commit 4b383d0af93136b80841fc140da0823dfc441dd4.
2023-10-31 10:53:45 -04:00
Zequan Wu
4b383d0af9
[Profile] Refactor profile correlation. (#70712)
Refactor some code from https://github.com/llvm/llvm-project/pull/69493.

Rebase of https://github.com/llvm/llvm-project/pull/69656 on top of main
as it was messed up.
2023-10-31 10:41:01 -04:00
Alex Voicu
dd5d65adb6 [HIP][Clang][CodeGen] Add CodeGen support for hipstdpar
This patch adds the CodeGen changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. This change relaxes restrictions on what gets emitted on the device path, when compiling in `hipstdpar` mode:

1. Unless a function is explicitly marked `__host__`, it will get emitted, whereas before only `__device__` and `__global__` functions would be emitted;
2. Unsupported builtins are ignored as opposed to being marked as an error, as the decision on their validity is deferred to the `hipstdpar` specific code selection pass;
3. We add a `hipstdpar` specific pass to the opt pipeline, independent of optimisation level:
    - When compiling for the host, iff the user requested it via the `--hipstdpar-interpose-alloc` flag, we add a pass which replaces canonical allocation / deallocation functions with accelerator aware equivalents.

A test to validate that unannotated functions get correctly emitted is added as well.

Reviewed by: yaxunl, efriedma

Differential Revision: https://reviews.llvm.org/D155850
2023-10-17 11:41:36 +01:00
Matheus Izvekov
1ab68d708a
[clang][codegen] Add a verifier IR pass before any further passes. (#68015)
This helps check clang generated good IR in the first place, as
otherwise this can cause UB in subsequent passes, with the final
verification pass not catching any issues.

This for example would have helped catch
https://github.com/llvm/llvm-project/issues/67937 earlier.
2023-10-03 18:05:54 +02:00