Summary:
Currently, the GPU iterates through all of the present symbols and
copies them by prefix. This is inefficient as it requires a lot of small
high-latency data transfers rather than a few large ones. Additionally,
we force every single profiling symbol to have protected visibility.
This means potentially hundreds of unnecessary symbols in the symbol
table.
This PR changes the interface to move towards the start / stop section
handling. AMDGPU supports this natively as an ELF target, so we need
little changes. Instead of overriding visibility, we use a single table
to define the bounds that we can obtain with one contiguous load.
Using a table interface should also work for the in-progress HIP
implementation for this, as it wraps the start / stop sections into
standard void pointers which will be inside of an already mapped region
of memory, so they should be accessible from the HIP API.
NVPTX is more difficult as it is an ELF platform without this support. I
have hooked up the 'Other' handling to work around this, but even then
it's a bit of a stretch. I could remove this support here, but I wanted
to demonstrate that we can share the ABI. However, NVPTX will only work
if we force LTO and change the backend to emit variables in the same
TL;DR, we now do this:
```c
struct { start1, stop1, start2, stop2, start3, stop3, version; } device;
struct host = DtoH(lookup("device"));
counters = DtoH(host.stop - host.start)
version = DtoH(host.version);
```
Currently all of these functions are empty bodies; this means that when
including this header and compiling with
__SANITIZER_DISABLE_CONTAINER_OVERFLOW__ defined, warnings are emitted
about the missing return values for those functions that do return
values.
This patch returns success values for all those check functions with
non-void return types. Note: these were originally added in
https://github.com/llvm/llvm-project/pull/163468.
__asan_region_is_poisoned() uses an exclusive end address
(end = beg + size) to validate the region [beg, end) and to compute
the aligned inner shadow region. This causes correctness issue
near memory range upper boundary and could trigger address space
overflow on 32-bit targets.
1. Incorrect handling of the last byte of a memory range
The implementation checks AddrIsInMem(end) instead of the last
application byte (end - 1). For regions ending at the last byte
of Low/Mid/HighMem (e.g. __asan_region_is_poisoned(kHighMemEnd, 1)),
this returns end (kHighMemEnd + 1) instead of the original
pointer. This behavior is inconsistent with the function’s
semantics and with __asan_address_is_poisoned().
2) address space overflow and invalid shadow range
If a region ends at the top of the virtual address space (kHighMemEnd),
e.g. on 32-bit targets, end = beg + size could wrap to 0.
This violated the invariant beg < end and could trigger
the CHECK failure.
Additionally, overflow in RoundUpTo alignment computations
for aligned_b could produce an invalid shadow region spanning
LowShadow to HighShadow across ShadowGap, leading mem_is_zero()
to access unmapped memory and crash.
Fix by switching to an inclusive last byte:
last = beg + size - 1
All checks are now performed on beg and last. The aligned inner
shadow region is also computed from [beg, last]. Additional guard
for aligned_b prevents the mapping to shadow if aligned_b is wrapped
(in this case the aligned inner region is also empty and doesn't
require the shadow scan via mem_is_zero()).
This fixes incorrect return values at memory range ends and
prevents overflow related crashes on 32-bit targets.
Test is extended to cover these boundary cases.
---------
Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
This change modernizes FuzzedDataProvider.h now that C++17+ is standard
in LLVM.
Replace the runtime if with if constexpr in ConvertUnsignedToSigned
Make the unsigned/signed comparison explicit by casting TS::max() to TU
Intended use-case is for threads that use (or switch to) stack with
special properties e.g. backed by MADV_DONTDUMP memory.
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
Document a define to allow library developers to support disabling
AddressSanitizer's container overflow detection in template code at
compile time.
The primary motivation is to reduce false positives in environments
where
libraries and frameworks that cannot be recompiled with sanitizers
enabled
are called from application code. This supports disabling checks when
the
runtime environment cannot be reliably controlled to use ASAN_OPTIONS.
Key changes:
- Use the define `__SANITIZER_DISABLE_CONTAINER_OVERFLOW__` to disable
instrumentation at compile time
- Implemented redefining the container overflow APIs in
common_interface_defs.h
to use define to provide null implementation when define is present
- Update documentation in AddressSanitizer.rst to suggest and illustrate
use of the define
- Add details of the define in PrintContainerOverflowHint()
- Add test disable_container_overflow_checks to verify new hints on the
error and fill the testing gap that
ASAN_OPTIONS=detect_container_overflow=0
works
- Add tests demonstrating the issue around closed source libraries and
instrumented apps that both modify containers
This requires no compiler changes and should be supportable cross
compiler toolchains.
An RFC has been opened to discuss:
https://discourse.llvm.org/t/rfc-add-fsanitize-address-disable-container-overflow-flag-to-addresssanitizer/88349
Changes to initial PR (#140068):
- Mark failing test as unsupported for powerpc64le, as test failure is
unrelated to PR changes. See
https://github.com/llvm/llvm-project/issues/141598
---
Original description (from #140068)
The XRay interface header uses no C++ specific features aside from using
the std namespace and including the C++ variant of C headers. Yet, these
changes prevent using `xray_interface.h` in external tools relying on C
for different reasons. Make this header C compliant by using C headers,
removing the std namespace from std::size_t and guard `extern "C"`.
To make sure that further changes to not break the interface
accidentially, port one test from C++ to C. This requires the C23
standard to officially support the attribute syntax used in this test
case.
Note that this only resolves this issue for `xray_interface.h`.
`xray_records.h` is also not C compliant, but requires more work to
port.
Fixes#139902
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
Reapply #147854 after fixes merged in #151398.
Change memory access histogram storage from uint64_t to uint16_t to
reduce profile size on disk. This change updates the raw profile format
to v5. Also add a histogram test in compiler-rt since we didn't have one
before. With this change the histogram memprof raw for the basic test
reduces from 75KB -> 20KB.
Change memory access histogram storage from uint64_t to uint16_t to
reduce profile size on disk. This change updates the raw profile format
to v5. Also add a histogram test in compiler-rt since we didn't have one
before. With this change the histogram memprof raw for the basic test
reduces from 75KB -> 20KB.
## Purpose
The compiler-rt project `check-same-common-code.test` test case started
failing after #142861 was merged. This change addresses the failure.
## Overview
This patch replicates the changes made by #142861 to
llvm/include/llvm/ProfileData/InstrProfData.inc in the duplicated file
compiler-rt/include/profile/InstrProfData.inc. These files otherwise
match.
## Validation
Locally built `check-profile` target and verified
`check-same-common-code.test` now passes.
The XRay interface header uses no C++ specific features aside from using
the `std` namespace and including the C++ variant of C headers. Yet,
these changes prevent using `xray_interface.h` in external tools relying
on C for different reasons. Make this header C compliant by using C
headers, removing the `std` namespace from `std::size_t` and guard
`extern "C"`.
To make sure that further changes to not break the interface
accidentally, port one test from C++ to C. This requires the C23
standard to officially support the attribute syntax used in this test
case.
Note that this only resolves this issue for `xray_interface.h`.
`xray_records.h` is also not C compliant, but requires more work to
port.
Fixes#139902
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
'list.size()' is determined at runtime, so using static_assert on it as
suggested by the TODO comment is not feasible and produces the following
error when done:
error: static assertion expression is not an integral constant
expression
initially referenced in https://github.com/bitcoin/bitcoin/pull/32024
Co-authored-by: Chand-ra <chandrapratap376@gmail.com>
Add the `__memprof_default_options_str` variable, initialized via the
`-memprof-runtime-default-options` LLVM flag, to hold the default options string
for memprof. This allows us to set these options during compile time in
the clang invocation.
Also update the docs to describe the various ways to set these options.
This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a generalization of
19fb5b467b,
and helps to cover cases where the loop exit is never executed.
An example where this can occur are event handling loops.
Note that change does NOT change the default behavior.
In C++ it's UB to use undeclared values as enum.
And there is support __ATOMIC_HLE_ACQUIRE and
__ATOMIC_HLE_RELEASE need such values.
So use `int` in TSAN interface, and mask out
irrelevant bits and cast to enum ASAP.
`ThreadSanitizer.cpp` already declare morder parameterd
in these functions as `i32`.
This may looks like a slight change, as we
previously didn't mask out additional bits for `fmo`,
and `NoTsanAtomic` call. But from implementation
it's clear that they are expecting exact enum.
Reverts llvm/llvm-project#115032
Reapply llvm/llvm-project#114724
In C++ it's UB to use undeclared values as enum.
And there is support `__ATOMIC_HLE_ACQUIRE` and
`__ATOMIC_HLE_RELEASE` need such values.
Internal implementation was switched to `class enum`,
where that behavior is defined. But interface is C, so
we just switch to `int`.
Include `cstdlib` which was originally included transitively but the
changes to `vector` in libcpp breaks new builds due to missing cstdlib
header for `abort()` function call.
Add explicit symbol visibility macros to InstrProfData.inc
Annotating these symbols will fix missing symbols for InstrProfTest when
using shared library builds on windows with explicit visibility macros
enabled.
Add a empty fallback definition for LLVM_ABI macro so the code works in
compiler-rt.
This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and plugins on
window.
```
llvm\lld-link : error : undefined symbol: public: void ValueProfData::deserializeTo(InstrProfRecord&, InstrProfSymtab*)
>>> referenced by unittests\ProfileData\InstrProfTest.cpp:1372 void ValueProfileReadWriteTest_value_prof_data_read_write_Test::TestBody()
```
This fixes remaining issues in my previous PR #90959.
Changes:
- Removed dependency on LLVM header in `xray_interface.cpp`
- Fixed XRay patching for some targets due to missing changes in
architecture-specific patching functions
- Addressed some remaining compiler warnings that I missed in the
previous patch
- Formatting
I have tested these changes on `x86_64` (natively), as well as
`ppc64le`, `aarch64` and `arm32` (cross-compiled and emulated using
qemu).
**Original description:**
This PR introduces shared library (DSO) support for XRay based on a
revised version of the implementation outlined in [this
RFC](https://discourse.llvm.org/t/rfc-upstreaming-dso-instrumentation-support-for-xray/73000).
The feature enables the patching and handling of events from DSOs,
supporting both libraries linked at startup or explicitly loaded, e.g.
via `dlopen`.
This patch adds the following:
- The `-fxray-shared` flag to enable the feature (turned off by default)
- A small runtime library that is linked into every instrumented DSO,
providing position-independent trampolines and code to register with the
main XRay runtime
- Changes to the XRay runtime to support management and patching of
multiple objects
These changes are fully backward compatible, i.e. running without
instrumented DSOs will produce identical traces (in terms of recorded
function IDs) to the previous implementation.
Due to my limited ability to test on other architectures, this feature
is only implemented and tested with x86_64. Extending support to other
architectures is fairly straightforward, requiring only a
position-independent implementation of the architecture-specific
trampoline implementation (see
`compiler-rt/lib/xray/xray_trampoline_x86_64.S` for reference).
This patch does not include any functionality to resolve function IDs
from DSOs for the provided logging/tracing modes. These modes still work
and will record calls from DSOs, but symbol resolution for these
functions in not available. Getting this to work properly requires
recording information about the loaded DSOs and should IMO be discussed
in a separate RFC, as there are mulitple feasible approaches.
---------
Co-authored-by: Sebastian Kreutzer <sebastian.kreutzer@tu-darmstadt.de>
This PR registers the writeout and reset functions for `gcov` for all
modules in the PGO runtime, instead of registering them
using global constructors in each module. The change is made for AIX
only, but the same mechanism works on Linux on Power.
When registering such functions using global constructors in each module
without `-ffunction-sections`, the AIX linker cannot garbage collect
unused undefined symbols, because such symbols are grouped in the same
section as the `__sinit` symbol. Keeping such undefined symbols causes
link errors (see test case
https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1
for this scenario). This PR implements the initialization in the
runtime, hence avoiding introducing `__sinit` into each module.
The implementation adds a new global variable `__llvm_covinit_functions`
to each module. This new global variable contains the function pointers
to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s
section is the named section `__llvm_covinit`. The linker will aggregate
all the `__llvm_covinit` sections from each module
to form one single named section in the final binary. The pair of
functions
```
const __llvm_gcov_init_func_struct *__llvm_profile_begin_covinit();
const __llvm_gcov_init_func_struct *__llvm_profile_end_covinit();
```
are implemented to return the start and end address of this named
section in the final binary, and they are used in function
```
__llvm_profile_gcov_initialize()
```
(which is a constructor function in the runtime) so the runtime knows
the addresses of all the `Writeout` and `Reset` functions from all the
modules.
One noticeable implementation detail relevant to AIX is that to preserve
the `__llvm_covinit` from the linker's garbage collection, a `.ref`
pseudo instruction is inserted into them, referring to the section that
contains the `__llvm_gcov_ctr` variables, which are used in the
instrumented code. The `__llvm_gcov_ctr` variables did not belong to
named sections before, but this PR added them to the
`__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo
instruction that refers to them in the `__llvm_covinit` section.
This PR adds a `__sanitizer_copy_contiguous_container_annotations`
function, which copies annotations from one memory area to another. New
area is annotated in the same way as the old region at the beginning
(within limitations of ASan).
Overlapping case: The function supports overlapping containers, however
no assumptions should be made outside of no false positives in new
buffer area. (It doesn't modify old container annotations where it's not
necessary, false negatives may happen in edge granules of the new
container area.) I don't expect this function to be used with
overlapping buffers, but it's designed to work with them and not result
in incorrect ASan errors (false positives).
If buffers have granularity-aligned distance between them (`old_beg %
granularity == new_beg % granularity`), copying algorithm works faster.
If the distance is not granularity-aligned, annotations are copied byte
after byte.
```cpp
void __sanitizer_copy_contiguous_container_annotations(
const void *old_storage_beg_p, const void *old_storage_end_p,
const void *new_storage_beg_p, const void *new_storage_end_p) {
```
This function aims to help with short string annotations and similar
container annotations. Right now we change trait types of
`std::basic_string` when compiling with ASan and this function purpose
is reverting that change as soon as possible.
87f3407856/libcxx/include/string (L738-L751)
The goal is to not change `__trivially_relocatable` when compiling with
ASan. If this function is accepted and upstreamed, the next step is
creating a function like `__memcpy_with_asan` moving memory with ASan.
And then using this function instead of `__builtin__memcpy` while moving
trivially relocatable objects.
11a6799740/libcxx/include/__memory/uninitialized_algorithms.h (L644-L646)
---
I'm thinking if there is a good way to address fact that in a container
the new buffer is usually bigger than the previous one. We may add two
more arguments to the functions to address it (the beginning and the end
of the whole buffer.
Another potential change is removing `new_storage_end_p` as it's
redundant, because we require the same size.
Potential future work is creating a function `__asan_unsafe_memmove`,
which will be basically memmove, but with turned off instrumentation
(therefore it will allow copy data from poisoned area).
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
Removes a dependency on LLVM in `xray_interface.cpp` by replacing
`llvm_unreachable` with compiler-rt's `UNREACHABLE`.
Applies clang-format to some unformatted changes.
Original PR: #90959
This PR introduces shared library (DSO) support for XRay based on a
revised version of the implementation outlined in [this
RFC](https://discourse.llvm.org/t/rfc-upstreaming-dso-instrumentation-support-for-xray/73000).
The feature enables the patching and handling of events from DSOs,
supporting both libraries linked at startup or explicitly loaded, e.g.
via `dlopen`.
This patch adds the following:
- The `-fxray-shared` flag to enable the feature (turned off by default)
- A small runtime library that is linked into every instrumented DSO,
providing position-independent trampolines and code to register with the
main XRay runtime
- Changes to the XRay runtime to support management and patching of
multiple objects
These changes are fully backward compatible, i.e. running without
instrumented DSOs will produce identical traces (in terms of recorded
function IDs) to the previous implementation.
Due to my limited ability to test on other architectures, this feature
is only implemented and tested with x86_64. Extending support to other
architectures is fairly straightforward, requiring only a
position-independent implementation of the architecture-specific
trampoline implementation (see
`compiler-rt/lib/xray/xray_trampoline_x86_64.S` for reference).
This patch does not include any functionality to resolve function IDs
from DSOs for the provided logging/tracing modes. These modes still work
and will record calls from DSOs, but symbol resolution for these
functions in not available. Getting this to work properly requires
recording information about the loaded DSOs and should IMO be discussed
in a separate RFC, as there are mulitple feasible approaches.
@petrhosek @jplehr
Pull request for issue #110823
Including the file which defines the macros we use here. This would let
user code only include this interface, rather than having to include two
files.
`counter_bias` is incompatible to Bitmap. The distance between Counters
and Bitmap is different between on-memory sections and profraw image.
Reference to `__llvm_profile_bitmap_bias` is generated only if
`-fcoverge-mcdc` `-runtime-counter-relocation` are specified. The
current implementation rejected their options.
```
Runtime counter relocation is presently not supported for MC/DC bitmaps
```
This change adds a new weak API function which makes the sanitizer
ignore the call to free(), and implements the
functionality in ASan and HWAsan. The runtime that implements this hook
can then call free() at a later point again on the same pointer (and
making sure the hook returns zero so that the memory will actually be
freed) when it's actually ready for the memory to be cleaned up.
This is needed in order to implement an sanitizer-compatible version
of Chrome's BackupRefPtr algorithm, since process-wide double-shimming
of malloc/free does not work on some platforms.
Requested and designed by @c01db33f (Mark) from Project Zero.
---------
Co-authored-by: Mark Brand <markbrand@google.com>
This Memprof preprocessor directives have diverged at some point. This
patch fixes that by copying an additional include from the LLVM side to
the compiler-rt side so the two files are identical again.
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection on and off. The
-memprof-histogram flag relies on -memprof-use-callbacks=true to work.
Updates shadow mapping logic in histogram mode from having one 8 byte
counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only
supports this granularity as of now.
Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary
format, including changing to a new version of the raw binary format
from version 3 to version 4.
Updates creating MemoryInfoBlocks with and without Histograms. When two
MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter
Histogram is removed.
Adds a memprof_histogram test case.
Initial commit for adding AccessCountHistograms up until RawProfile for
memprof
This diff contains the compiler-rt changes / preparations for nsan.
Test plan:
1. cd build/runtimes/runtimes-bins && ninja check-nsan
2. ninja check-all
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
Fixes#83844.
This PR adds callbacks to mark futex syscalls as blocking. Unfortunately
we didn't have a mechanism before to mark syscalls as a blocking call,
so I had to implement it, but it mostly reuses the `BlockingCall`
implementation
[here](96819daa3d/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp (L362-L380)).
The issue includes some information but this issue was discovered
because Rust uses futexes directly. So most likely we need to update
Rust as well to use these callbacks.
Also see the latest comments in #85188 for some context.
I also sent another PR #84162 to mark `pthread_*_lock` calls as
blocking.
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.
1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))
**Original Description**
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
This is a follow up of
7b9d73c2f9
and
5ef9ba7412
* The definition of `Type::getInt8PtrTy` is deleted. This doesn't cause
a compile error because the `Initializer` part of the macro doesn't run.