This adds a general function that handles intrinsics by applying the
intrinsic to the shadows, and applies it to the specific case of Arm
NEON TBL/TBX intrinsics.
This also updates the tests from
https://github.com/llvm/llvm-project/pull/114462
In https://github.com/llvm/llvm-project/pull/109837, it sets a global
variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp
which introduced a data race detected by TSan. To fix this, I decouple
the flag setting, the flags are now set
separately(`instrument-cold-function-only-path` is required to be used
with `--pgo-instrument-cold-function-only`).
This patch adds support for cold function coverage instrumentation based
on sampling PGO counts. The major motivation is to detect dead functions
for the services that are optimized with sampling PGO. If a function is
covered by sampling profile count (e.g., those with an entry count > 0),
we choose to skip instrumenting those functions, which significantly
reduces the instrumentation overhead.
More details about the implementation and flags:
- Added a flag `--pgo-instrument-cold-function-only` in
`PGOInstrumentation.cpp` as the main switch to control skipping the
instrumentation.
- Built the extra instrumentation passes(a bundle of passes in
`addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by
`--instrument-cold-function-only-path` flag.
- Added a driver flag `-fprofile-generate-cold-function-coverage`:
- 1) Config the flags in one place, i,e. adding
`--instrument-cold-function-only-path=<...>` and
`--pgo-function-entry-coverage`. Note that the instrumentation file path
is passed through `--instrument-sample-cold-function-path`, because we
cannot use the `PGOOptions.ProfileFile` as it's already used by
`-fprofile-sample-use=<...>`.
- 2) makes linker to link `compiler_rt.profile` lib(see
[ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131)
).
- Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry
count to determine cold function.
Overall, the full command is like:
```
clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code
```
# What
This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.
# Why?
- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.
We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.
# Concerns
- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
CTMark #113200 size overhead was 5.3%, now it's 4.7%.
The patch affects only signed integers.
https://alive2.llvm.org/ce/z/Lv5hyi
* The patch replaces code which extracted sign bit,
maximized/minimized it, then packed it back, with
simple sign bit flip. The another way to think about
transformation is as a subtraction of MIN_SINT from
A/B. Then we map MIN_SINT to 0, 0 to -MIN_SINT, and
MAX_SINT to MAX_UINT.
* Then to maximize/minimize A/B we don't need
to extract sign bit, we can apply shadow the
same way as to other bits.
* After sign bit flip, we had to switch to unsigned
version of the predicates.
* After change above getHighestPossibleValue/getLowestPossibleValue
became very similar, so we can combine into a single function.
* Because the function does sign bit flip and
requires unsigned predicates used for returned values,
there is no point in keeping it as a member of class,
to hide, we switch to function local lambda.
With sampled instrumentation (#69535), profile counts may appear corrupt
and `fixFuncEntryCount` may assert. In particular a function can have a
0 block count for its entry, while later blocks are non zero. This is
only likely to happen for colder functions, so it is reasonable to take
any action that does not crash. Here we simply bail from fixing the
entry count.
This corrects a couple off by ones related to the sampling of
**instrumented** counters, and enables setting 100% rates for burst
sampling (burst duration = period).
Off by ones:
Prior to this change it was impossible to set a period of 65535 because
this was converted to fast sampling which rollsover at USHRT_MAX + 1
(65536). Similarly the burst durations would collect burst duration + 1
counts as they used an ULE comparison.
100% sampling:
Although this is not useful for a productionized use case, it does allow
for more deterministic testing with the sampling checks in place. After
all the off by ones are fixed, allowing for 100% sampling is a matter of
letting burst duration = period.
Fixes#111212.
This grows .text by 5.3% on CTMark, (or 2.6% large internal binary)
Perf regressed by 1.6%. We will try to improve in follow up patches.
It worth to pay some performance regression to fix
correctness to avoid stuff like #111212.
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.
The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.
Fixes#112633
This PR registers the writeout and reset functions for `gcov` for all
modules in the PGO runtime, instead of registering them
using global constructors in each module. The change is made for AIX
only, but the same mechanism works on Linux on Power.
When registering such functions using global constructors in each module
without `-ffunction-sections`, the AIX linker cannot garbage collect
unused undefined symbols, because such symbols are grouped in the same
section as the `__sinit` symbol. Keeping such undefined symbols causes
link errors (see test case
https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1
for this scenario). This PR implements the initialization in the
runtime, hence avoiding introducing `__sinit` into each module.
The implementation adds a new global variable `__llvm_covinit_functions`
to each module. This new global variable contains the function pointers
to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s
section is the named section `__llvm_covinit`. The linker will aggregate
all the `__llvm_covinit` sections from each module
to form one single named section in the final binary. The pair of
functions
```
const __llvm_gcov_init_func_struct *__llvm_profile_begin_covinit();
const __llvm_gcov_init_func_struct *__llvm_profile_end_covinit();
```
are implemented to return the start and end address of this named
section in the final binary, and they are used in function
```
__llvm_profile_gcov_initialize()
```
(which is a constructor function in the runtime) so the runtime knows
the addresses of all the `Writeout` and `Reset` functions from all the
modules.
One noticeable implementation detail relevant to AIX is that to preserve
the `__llvm_covinit` from the linker's garbage collection, a `.ref`
pseudo instruction is inserted into them, referring to the section that
contains the `__llvm_gcov_ctr` variables, which are used in the
instrumented code. The `__llvm_gcov_ctr` variables did not belong to
named sections before, but this PR added them to the
`__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo
instruction that refers to them in the `__llvm_covinit` section.
Implement -sanitizer-coverage-gated-trace-callbacks to gate the
invocation of the tracing callbacks based on the value of a global
variable, which is stored in a specific section.
When this option is enabled, the instrumentation will not call into the
runtime-provided callbacks for tracing, thus only incurring in a trivial
branch without going through a function call. It is up to the runtime to
toggle the value of the global variable in order to enable tracing.
This option is only supported for trace-pc-guard.
Note: will add additional support for trace-cmp in a follow up PR.
Patch by Filippo Bigarella
rdar://101626834
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:
1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
- [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
- [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested
See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
The `PGOInstrumentationGen` pass should preserve all analysis results
when nothing was actually instrumented. Currently, only modules that
contain at least a single function definition are instrumented. When a
module contains only function declarations and, optionally, global
variable definitions (a module for the regular-LTO phase for thin-LTO
when LTOUnit splitting is enabled, for example), such module is not
instrumented (yet?) and there is no reason to invalidate any analysis
results.
NFC.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Per the discussion in #102542, it is safe to insert BBs under
`lowerIntrinsics()` since #69535 has made tolerant of modifying BBs.
So, I can get rid of using the inlined function `rmw_or`, introduced in
#96040.
Performance critical core libraries could be highly-optimized for arch
or micro-arch features. For instance, the absl crc library specializes
different templated classes among different hardwares [1]. In a
practical setting, it's likely that instrumented profiles are collected
on one type of machine and used to optimize binaries that run on
multiple types of hardwares.
While this kind of specialization is rare in terms of lines of code,
compiler can do a better job to skip vtable-based ICP.
* The per-class `Extend` implementation is arch-specific as well. If an
instrumented profile is collected on one arch and applied to another
arch where `Extend` implementation is different, `Extend` might be
regarded as unlikely function in the latter case. `ABSL_ATTRIBUTE_HOT`
annotation alleviates the problem by putting all `Extend` implementation
into the hot text section [2]
This change introduces a comma-separated list to specify the mangled
vtable names, and ICP pass will skip vtable-based comparison if a vtable
variable definition is shown to be in its class hierarchy (per LLVM type
metadata).
[1]
c6b27359c3/absl/crc/internal/crc_x86_arm_combined.cc (L621-L650)
[2]
c6b27359c3/absl/crc/internal/crc_x86_arm_combined.cc (L370C3-L370C21)
Relationship between "-hwasan-mapping-offset",
"-hwasan-with-ifunc", and "-hwasan-with-tls" can
be to hard to understand.
Now we will have "-hwasan-mapping-offset",
presense of which will imply fixed shadow.
If "-hwasan-mapping-offset-dynamic" will set one
of 3 available dynamic shadows.
As-is "-hwasan-mapping-offset" has precedence over
"-hwasan-mapping-offset-dynamic". In follow up
patches we need to use the one with last
occurrence.
The goal to is to reorder this function to make
initialization in following order:
1. Defaults
2. Target specific overrides
3. Explicit copt<> overrides
The `step` instrumentation shouldn't be treated, during use, like an `increment`. The latter is treated as a BB ID. The step isn't that, it's more of a type of value profiling. We need to distinguish between the 2 when really looking for BB IDs (==increments), and handle appropriately `step`s. In particular, we need to know when to elide them because `select`s may get elided by function cloning, if the condition of the select is statically known.
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
This adds support for:
* `muslabin32` (MIPS N32)
* `muslabi64` (MIPS N64)
* `muslf32` (LoongArch ILP32F/LP64F)
* `muslsf` (LoongArch ILP32S/LP64S)
As we start adding glibc/musl cross-compilation support for these
targets in Zig, it would make our life easier if LLVM recognized these
triples. I'm hoping this'll be uncontroversial since the same has
already been done for `musleabi`, `musleabihf`, and `muslx32`.
I intentionally left out a musl equivalent of `gnuf64` (LoongArch
ILP32D/LP64D); my understanding is that Loongson ultimately settled on
simply `gnu` for this much more common case, so there doesn't *seem* to
be a particularly compelling reason to add a `muslf64` that's basically
deprecated on arrival.
Note: I don't have commit access.
It was found that ASAN logic optimizes away out-of-bound access
instrumentation for over-aligned arrays. See #108287 for complete code
examples.
Fix it by not considering alignment during object size calculation,
since out-of-bounds access for over-aligned object is still UB and
should be reported by ASAN.
Closes: #108287
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
Added tests to the validator and fixed issues stemming from the previous skipping over BBs with single successors - which is incorrect. That would be now picked by added tests where the assertions are expected to be triggered.
The assertion that all out-edges of a BB can't be 0 is incorrect: they
can be, if that branch is on a cold subgraph.
Added validators and asserts about the expected proprerties of the
propagated counters.