As mentioned in https://github.com/llvm/llvm-project/pull/118989, all
sanitizers but tsan are converted to just module pass for easier
maintenance.
This patch removes the TySan function pass, convert TySan from
function+module pass to just module pass.
#120613 removed -ubsan-unique-traps and replaced it with
-fno-sanitize-merge (introduced in #120511), which allows fine-grained
control of which UBSan checks to prevent merging. This analogous patch
removes -bound-checking-unique-traps, and allows it to be controlled via
-fno-sanitize-merge=local-bounds.
Most of this patch is simply plumbing through the compiler flags into
the bounds checking pass.
Note: this patch subtly changes -fsanitize-merge (the default) to also
include -fsanitize-merge=local-bounds. This is different from the
previous behavior, where -fsanitize-merge (or the old
-ubsan-unique-traps) did not affect local-bounds (requiring the separate
-bounds-checking-unique-traps). However, we argue that the new behavior
is more intuitive.
Removing -bounds-checking-unique-traps and merging its functionality
into -fsanitize-merge breaks backwards compatibility; we hope that this
is acceptable since '-mllvm -bounds-checking-unique-traps' was an
experimental flag.
This fix the case, when single hot inlined callsite, prevent
checks for all other. This helps to reduce number of removed checks up
to 50% (deppedes on `cutoff-hot` value) .
`ScalarOptimizerLateEPCallback` was happening during
CGSCC walk, after each inlining, but this is effectively
after inlining.
Example, order in comments:
```
static void overflow() {
// 1. Inline get/set if possible
// 2. Simplify
// 3. LowerAllowCheckPass
set(get() + get());
}
void test() {
// 4. Inline
// 5. Nothing for LowerAllowCheckPass
overflow();
}
```
With this patch it will look like:
```
static void overflow() {
// 1. Inline get/set if possible
// 2. Simplify
set(get() + get());
}
void test() {
// 3. Inline
// 4. Simplify
overflow();
}
// Later, after inliner CGSCC walk complete:
// 5. LowerAllowCheckPass for `overflow`
// 6. LowerAllowCheckPass for `test`
```
This allows shared libraries instrumented with RTSan to be initialized.
This approach directly mirrors the approach in Tsan, Asan and many of
the other sanitizers
The early simplication pipeline is used in non-LTO and (Thin/Full)LTO
pre-link
stage. There are some passes that we want them in non-LTO mode, but not
at LTO
pre-link stage. The control is missing currently. This PR adds the
support. To
demonstrate the use, we only enable the internalization pass in non-LTO
mode for
AMDGPU because having it run in pre-link stage causes some issues.
Currently, the `DropTypeTests` parameter only fully works with phi nodes
and llvm.assume instructions. However, we'd like CFI to work in
conjunction with FatLTO, in so far as the bitcode section should be able
to contain the CFI instrumentation, while any incompatible bits are
dropped when compiling the object code.
To do that, we need to drop the llvm.type.test instructions everywhere,
and not just their uses in phi nodes. This patch updates the
LowerTypeTest pass so that uses are removed, and replaced with `true` in
all cases, and not just in phi nodes.
Addressing this will allow us to fix#112053 by modifying the FatLTO
pipeline.
Reviewers: pcc, nikic
Reviewed By: pcc
Pull Request: https://github.com/llvm/llvm-project/pull/112787
This feature is enabled by `-codegen-data-thinlto-two-rounds`, which
effectively runs the `-codegen-data-generate` and `-codegen-data-use` in
two rounds to enable global outlining with ThinLTO.
1. The first round: Run both optimization + codegen with a scratch
output.
Before running codegen, we serialize the optimized bitcode modules to a
temporary path.
2. From the scratch object files, we merge them into the codegen data.
3. The second round: Read the optimized bitcode modules and start the
codegen only this time.
Using the codegen data, the machine outliner effectively performs the
global outlining.
Depends on #90934, #110461 and #110463.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
This reapplies commit 1911a50fae8a441b445eb835b98950710d28fc88 with a
minor fix in lld/ELF/LTO.cpp which sets Options.BBAddrMap when
`--lto-basic-block-sections=labels` is passed.
This feature is supported via the newer option
`-fbasic-block-address-map`. Using the old option still works by
delegating to the newer option, while a warning is printed to show
deprecation.
2fcaa549a824efeb56e807fcf750a56bf985296b (2010) added cc1as option
`-output-asm-variant` (untested) to set the output syntax.
`clang -cc1as -filetype asm -output-asm-variant 1` allows AT&T input and
Intel output (`AssemblerDialect` is also used by non-x86 targets).
This patch renames the cc1as option (to avoid collision with -o) and
makes it available for cc1 to set output syntax. This allows different
input & output syntax:
```
echo 'asm("mov $1, %eax");' | clang -xc - -S -o - -Xclang --output-asm-variant=1
```
Note: `AsmWriterFlavor` (with a misleading name), used to initialize
MCAsmInfo::AssemblerDialect, is primarily used for assembly input, not
for output.
Therefore,
`echo 'asm("mov $1, %eax");' | clang -x c - -mllvm --x86-asm-syntax=intel -S -o -`,
which achieves a similar goal before Clang 19, was unintended.
Close#109157
Pull Request: https://github.com/llvm/llvm-project/pull/109360
This patch reduces the memory usage for import lists by employing
memory-efficient data structures.
With this patch, an import list for a given destination module is
basically DenseSet<uint32_t> with each element indexing into the
deduplication table containing tuples of:
{SourceModule, GUID, Definition/Declaration}
In one of our large applications, the peak memory usage goes down by
9.2% from 6.120GB to 5.555GB during the LTO indexing step.
This patch addresses several sources of space inefficiency associated
with std::unordered_map:
- std::unordered_map<GUID, ImportKind> takes up 16 bytes because of
padding even though ImportKind only carries one bit of information.
- std::unordered_map uses pointers to elements, both in the hash table
proper and for collision chains.
- We allocate an instance of std::unordered_map for each
{Destination Module, Source Module} pair for which we have at least
one import. Most import lists have less than 10 imports, so the
metadata like the size of std::unordered_map and the pointer to the
hash table costs a lot relative to the actual contents.
The behavior deliberately mimics that of clang. Ideally, -print-pipeline-passes
should be a first-class driver option. Notes to this effect have been added in
the appropriate places in both flang and clang.
---------
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
Introduce the `-fsanitize=realtime` flag in clang driver
Plug in the RealtimeSanitizer PassManager pass in Codegen, and attribute
a function based on if it has the `[[clang::nonblocking]]` function
effect.
-Wa,-mmapsyms=implicit enables the alternative mapping symbol scheme
discussed at #99718.
While not conforming to the current aaelf64 ABI, the option is
invaluable for those with full control over their toolchain, no reliance
on weird relocatable files, and a strong focus on minimizing both
relocatable and executable sizes.
The option is discouraged when portability of the relocatable objects is
a concern.
https://maskray.me/blog/2024-07-21-mapping-symbols-rethinking-for-efficiency
elaborates the risk.
Pull Request: https://github.com/llvm/llvm-project/pull/104542
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a
regression on O0 builds and was reverted.
This patch trys to bring that back by:
1. reverts the
[revert](1579e9ca9c).
2. `createObjCARCContractPass` only on optimized builds.
Tests are updated to refelect the changes. Specifically, all `O0` tests
should not include `ObjCARCContractPass`
Signed-off-by: Peter Rong <PeterRong@meta.com>
We have reworked the bitcode linking option to no longer link twice if
post-optimization linking is requested. As such, we no longer need to
conditionally link bitcodes supplied via -mlink-bitcode-file, as there
is no danger of linking them twice
This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2.
This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7.
Causes major compile-time regressions for unoptimized builds.
Prior to this patch, when using -fthinlto-index= the ObjCARCContractPass isn't run prior to CodeGen, and instruction selection fails on IR containing arc intrinsics. This patch is motivated by that usecase.
The pass was previously added in various places codegen is performed. This patch adds the pass to the default codegen pipepline, makes sure it bails immediately if no arc intrinsics are found, and removes the adhoc scheduling of the pass.
Co-authored-by: Nuri Amari <nuriamari@fb.com>
Currently, when the -relink-builtin-bitcodes-postop option is used we
link builtin bitcodes twice: once before optimization, and again after
optimization.
With this change, we omit the pre-opt linking when the option is set,
and we rename the option to the following:
-Xclang -mlink-builtin-bitcodes-postopt
(-Xclang -mno-link-builtin-bitcodes-postopt)
The goal of this change is to reduce compile time. We do lose the
theoretical benefits of pre-opt linking, but in practice these are small
than the overhead of linking twice. However we may be able to address
this in a future patch by adjusting the position of the builtin-bitcode
linking pass.
Compilations not setting the option are unaffected
When set, the compiler will use separate unique sections for global
symbols in named special sections (e.g. symbols that are annotated with
__attribute__((section(...)))). Doing so enables linker GC to collect
unused symbols without having to use a different section per-symbol.
To be removed and promoted to a proper driver flag if experiments turn
out fruitful.
For now, this can be experimented with `-mllvm
-pgo-cold-func-opt=[optsize|minsize|optnone|default] -mllvm
-enable-pgo-force-function-attrs`.
Original LLVM patch for this functionality: #69030
With #83471 it reduces UBSAN overhead from 44% to 6%.
Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks"
with PGO build.
On real large server binary we see 95% of code is still instrumented,
with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only
with subset of UBSAN, so base overhead is smaller.
We have followup patches to improve it even further.
The convention is for such MC-specific options to reside in
MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do
not follow the convention: `CompressDebugSections` is defined in both
TargetOptions and MCAsmInfo and there is forwarding complexity.
Move the option to MCTargetOptions and hereby simplify the code. Rename
the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc
-relax-relocations and llc -x86-relax-relocations can now be unified.
In addition to being rather hard to follow, there isn't a good reason
why FatLTO shouldn't just share the same code for setting module flags
for (Thin)LTO. This patch simplifies the logic and makes sure we use set
these flags in a consistent way, independent of FatLTO.
Additionally, we now test that output in the .llvm.lto section actually
matches the output from Full and Thin LTO compilation.
The performance of cold functions shouldn't matter too much, so if we
care about binary sizes, add an option to mark cold functions as
optsize/minsize for binary size, or optnone for compile times [1]. Clang
patch will be in a future patch.
This is intended to replace `shouldOptimizeForSize(Function&, ...)`.
We've seen multiple cases where calls to this expensive function, if not
careful, can blow up compile times. I will clean up users of that
function in a followup patch.
Initial version: https://reviews.llvm.org/D149800
[1]
https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.
First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.
| | |
|--|--|
| Address of the function | Function Address |
| Number of basic blocks in this function | NumBlocks |
| BB entry 1
| BB entry 2
| ...
| BB entry #NumBlocks
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
| Number of sections for this function | NumBBRanges |
| Section 1 begin address | BaseAddress[1] |
| Number of basic blocks in section 1 | NumBlocks[1] |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.
We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
- New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
GCC supports -mtls-dialect= for several architectures to select TLSDESC.
This patch supports the following values
* x86: "gnu". "gnu2" (TLSDESC) is not supported yet.
* RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915)
AArch64 toolchains seem to support TLSDESC from the beginning, and the
general dynamic model has poor support. Nobody seems to use the option
-mtls-dialect= at all, so we don't bother with it.
There also seems very little interest in AArch32's TLSDESC support.
TLSDESC does not change IR, but affects object file generation. Without
a backend option the option is a no-op for in-process ThinLTO.
There seems no motivation to have fine-grained control mixing trad/desc
for TLS, so we just pass -mllvm, and don't bother with a modules flag
metadata or function attribute.
Co-authored-by: Paul Kirth <paulkirth@google.com>