This allows static asserts to be set in tracing code that might use the
ReleaseToOS values as indexes.
This would have caused a compile failure instead of a runtime crash when
I added the use of a new ReleaseToOS value.
Summary:
Currently, the GPU iterates through all of the present symbols and
copies them by prefix. This is inefficient as it requires a lot of small
high-latency data transfers rather than a few large ones. Additionally,
we force every single profiling symbol to have protected visibility.
This means potentially hundreds of unnecessary symbols in the symbol
table.
This PR changes the interface to move towards the start / stop section
handling. AMDGPU supports this natively as an ELF target, so we need
little changes. Instead of overriding visibility, we use a single table
to define the bounds that we can obtain with one contiguous load.
Using a table interface should also work for the in-progress HIP
implementation for this, as it wraps the start / stop sections into
standard void pointers which will be inside of an already mapped region
of memory, so they should be accessible from the HIP API.
NVPTX is more difficult as it is an ELF platform without this support. I
have hooked up the 'Other' handling to work around this, but even then
it's a bit of a stretch. I could remove this support here, but I wanted
to demonstrate that we can share the ABI. However, NVPTX will only work
if we force LTO and change the backend to emit variables in the same
TL;DR, we now do this:
```c
struct { start1, stop1, start2, stop2, start3, stop3, version; } device;
struct host = DtoH(lookup("device"));
counters = DtoH(host.stop - host.start)
version = DtoH(host.version);
```
Currently all of these functions are empty bodies; this means that when
including this header and compiling with
__SANITIZER_DISABLE_CONTAINER_OVERFLOW__ defined, warnings are emitted
about the missing return values for those functions that do return
values.
This patch returns success values for all those check functions with
non-void return types. Note: these were originally added in
https://github.com/llvm/llvm-project/pull/163468.
This commit adds C helper functions `dnan2`, `dnorm2` and `dunder` for
handling the less critical edge cases of double-precision arithmetic,
similar to `fnan2`, `fnorm2` and `funder` that were added in commit
f7e652127772e93.
It also adds a header file that defines some register aliases for
handling double-precision numbers in AArch32 software floating point in
an endianness-independent way, by providing aliases `xh` and `xl` for
the high and low words of the first double-precision function argument,
regardless of which of them is in r0 and which in r1, and similarly `yh`
and `yl` for the second argument in r2/r3.
Tests for Android specific behavior don't really belong here since it is
affected by the config which is not necessarily the same on Android.
There are already tests that the config options and flag options work
properly. Android wrapper tests belong to Android.
#171941 got the builtins tests running under LLVM_ENABLE_RUNTIMES by
testing the builtins as part of the runtimes build.
As a consequence, CMake in `lib/builtins/` is no longer visible when
configuring the tests (but `test/builtins/` is). This means that the
`cmake_dependent_option` from `lib/builtins/` is not accounted for by
the tests, allowing COMPILER_RT_BUILD_CRT to be YES when
COMPILER_RT_HAS_CRT is NO. As a consequence, the CRT tests are running
on platforms where COMPILER_RT_HAS_CRT is false (#176892).
367da15a11/compiler-rt/lib/builtins/CMakeLists.txt (L1106-L1108)
Although the long-term solution could be to split both the builtins (and
their tests) out of compiler-rt into a top-level directory with shared
options, this works around the issue for the moment by checking both
COMPILER_RT_HAS_CRT and COMPILER_RT_BUILD_CRT before enabling the "crt"
feature.
Fixes#176892
Summary:
This PR enables the basic unit tests for builtins to be run on the GPU
architectures. Other targets like profiling are supported, but the
host-device natures will make it more difficult to adequately unit
test. It may be be possible to do basic tests there, to simply verify
that
counters are present and in the proper format for when they are copied
to the host.
Now that the corresponding libcxx change has landed, these tests should
be passing on some platforms.
This patch re-enables them for all platforms, so that we can see which
bots these do not work on and mark them unsupported accordingly.
rdar://167946476
In the builtins library, most functions have a portable C implementation
(e.g. `mulsf3.c`), and platforms might provide an optimized assembler
implementation (e.g. `arm/mulsf3.S`). The cmake script automatically
excludes the C source file corresponding to each assembly source file it
includes. Additionally, each source file name is automatically
translated into a flag that lit tests can query, with a name like
`librt_has_mulsf3`, to indicate that a function is available to be
tested.
In future commits I plan to introduce cases where a single .S file
provides more than one function (so that they can share code easily),
and therefore, must supersede more than one existing source file.
I've introduced the `crt_supersedes` cmake property, which you can set
on a .S file to name a list of .c files that it should supersede. Also,
the `crt_provides` property can be set on any source file to indicate a
list of functions it makes available for testing, in addition to the one
implied by its name.
Add one new flag, dealloc_align_mismatch that turns on/off alignment
checks. Add three new config parameters, one for deallocate type
mismatch (such as abort on new/free if true), one for checking if the
size parameter matches on dealloc and one for checking if the alignment
is correct on a dealloc.
Add extra flags to be passed for to indicate to do an align/size check.
Update report functions to better indicate the errors. Add unit tests
for all of these.
This is based on these upstream cls by jcking:
https://github.com/llvm/llvm-project/pull/147735https://github.com/llvm/llvm-project/pull/146556
Align beg address down instead of up in __asan_region_is_poisoned(), so
the shadow scan includes the first granule. This fixes a false negative
when first granule has an unpoisoned prefix and poisoned suffix.
Add test that covers this scenario.
Summary:
The changes in https://www.github.com/llvm/llvm-project/pull/185552
allowed us to
start building the standard `libclang_rt.profile.a` for GPU targets.
This PR expands this by adding an optimized GPU routine for counter
increment and removing the special-case handling of these functions in
the OpenMP runtime.
Vast majority of these functions are boilerplate, but we should be able
to do more interesting things with this in the future, like value or
memory profiling.
As per PEP-0394[1], there is no real concensus over what binary names
Python has, specifically 'python' could be Python 3, Python 2, or not
exist.
However, everyone has a python3 interpreter and the scripts are all
written for Python 3. Unify the shebangs so that the ~50% of shebangs
that use python now use python3.
[1] https://peps.python.org/pep-0394/
__asan_region_is_poisoned() uses an exclusive end address
(end = beg + size) to validate the region [beg, end) and to compute
the aligned inner shadow region. This causes correctness issue
near memory range upper boundary and could trigger address space
overflow on 32-bit targets.
1. Incorrect handling of the last byte of a memory range
The implementation checks AddrIsInMem(end) instead of the last
application byte (end - 1). For regions ending at the last byte
of Low/Mid/HighMem (e.g. __asan_region_is_poisoned(kHighMemEnd, 1)),
this returns end (kHighMemEnd + 1) instead of the original
pointer. This behavior is inconsistent with the function’s
semantics and with __asan_address_is_poisoned().
2) address space overflow and invalid shadow range
If a region ends at the top of the virtual address space (kHighMemEnd),
e.g. on 32-bit targets, end = beg + size could wrap to 0.
This violated the invariant beg < end and could trigger
the CHECK failure.
Additionally, overflow in RoundUpTo alignment computations
for aligned_b could produce an invalid shadow region spanning
LowShadow to HighShadow across ShadowGap, leading mem_is_zero()
to access unmapped memory and crash.
Fix by switching to an inclusive last byte:
last = beg + size - 1
All checks are now performed on beg and last. The aligned inner
shadow region is also computed from [beg, last]. Additional guard
for aligned_b prevents the mapping to shadow if aligned_b is wrapped
(in this case the aligned inner region is also empty and doesn't
require the shadow scan via mem_is_zero()).
This fixes incorrect return values at memory range ends and
prevents overflow related crashes on 32-bit targets.
Test is extended to cover these boundary cases.
---------
Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
`sanitizer_common` and its tests depend on the RPC XDR header for layout
compatibility. When this header is absent from a CI or build
environment, changes that silently break the expected struct layout go
undetected, since there is nothing to fail the build.
The default is opt-in — error on missing header is on by default for AIX
(where the dependency is known and the package is `bos.net.nfs.adt`) and
off by default elsewhere.
Changes:
1. On AIX, checks for `tirpc/rpc/xdr.h`; on all other platforms, checks
for `rpc/xdr.h`
2. Introduces `COMPILER_RT_REQUIRE_RPC_XDR_H` CMake option (default ON
on AIX, OFF elsewhere) that, when set, turns a missing header into a
fatal configuration error with an actionable message
3. Drive-by fix: Normalizes `HAVE_RPC_XDR_H` to 0 when the header is
absent, for consistent downstream `if()/#cmakedefine` behavior
Currently, when building the Go race detector (when SANITIZER_GO
is set), SANITIZER_WEAK_IMPORT is no-op. It is perfectly fine to
define SANITIZER_WEAK_IMPORT for Go just like other cases. That
will tell the Go linker to treat _dyld_get_dyld_header as a weak
import.
Perhaps SANITIZER_WEAK_ATTRIBUTE can also be defined for Go. That
would be a separate patch.
Add the architecture-specific pieces needed for the ASan and UBSan
sanitizer runtimes to build and run on hexagon-unknown-linux-musl.
Without this patch, building sanitizer runtimes for Hexagon Linux fails
with:
sanitizer_linux.cpp: error: member access into incomplete type
'struct stat64'
because musl libc does not provide struct stat64. This patch routes
Hexagon through the statx() syscall path (like LoongArch) to avoid the
stat64 dependency entirely.
Changes:
* asan_mapping.h: Add ASAN_SHADOW_OFFSET_CONST (0x20000000) for Hexagon
with shadow layout documentation.
* sanitizer_linux.cpp: Implement internal_clone() for Hexagon using
inline assembly (trap0 syscall, generic clone argument order: flags,
stack, ptid, ctid, tls). Route Hexagon through the statx() path for stat
operations since musl lacks struct stat64.
* sanitizer_linux.h: Add Hexagon to the internal_clone() declaration
guard.
* sanitizer_stoptheworld_linux_libcdep.cpp: Add Hexagon to the
StopTheWorld architecture guard with register definitions.
* sanitizer_asm.h: Define ASM_TAIL_CALL as 'jump' for Hexagon.
* CMakeLists.txt: Add -fno-emulated-tls for Hexagon targets. Hexagon
Linux uses native TLS via the UGP register; emulated TLS produces broken
sanitizer runtimes with unresolvable __emutls references.
As far as I am aware, AOR is no longer used anywhere within LLVM, as
most of the required code has since been ported to elsewhere within the
project.
Removes the entire directory, and updates some now outdated comments.
When the sum of two sub-normal values is not also subnormal, we need to
set the exponent to one.
Test case:
static volatile float x = 0x1.362b4p-127;
static volatile float x2 = 0x1.362b4p-127 * 2;
int
main (void)
{
printf("x %a x2 %a x + x %a\n", x, x2, x + x);
return x2 == x + x ? 0 : 1;
}
Signed-off-by: Keith Packard <keithp@keithp.com>
This is similar to #185770 where it removes an
exception-handling-related symbol from `compiler-rt` in favor of having
definitions elsewhere. The compiler-rt library is linked into all shared
objects, for example, which can result in duplicate definitions of a
symbol where this tag wants to have one unique definition. The intention
behind this commit is to defer the definition of this symbol to
downstream libraries, such as the definition of `longjmp` itself. An
example of this is WebAssembly/wasi-libc#772 where the responsibility of
defining this symbol now lies with wasi-libc.
The `__cpp_exception` symbol is now defined in libunwind instead of
compiler-rt. This is moved for a few reasons, but the primary reason is
that compiler-rt is linked duplicate-ly into all shared objects meaning
that it's not suitable for define-once symbols such as
`__cpp_exception`. By moving the definition to the user of the symbol,
libunwind itself, that guarantees that the symbol should be defined
exactly once and only when appropriate. A secondary reason for this
movement is that it avoids the need to compile compiler-rt twice: once
with exception and once without, and instead the same build can be used
for both exceptions-and-not.
Summary:
As suggested in https://github.com/llvm/llvm-project/pull/177665, we
should build a GPU version of the compiler-rt profile library instead of
writing it in-line in the lowering. This PR does not define anything GPU
specific, it simply re-uses the baremetal handling. Later PRs will
prevent the GPU specific handling we would want to do to optimize
counter handling on the GPU.
Note that this will require using the cache file, or setting these
options
manually for existing users. Hopefully if people are using the cache
file
as they should it won't break anything.
Two tests currently `FAIL` on Solaris/amd64 and Solaris/sparcv9:
```
SafeStack-Standalone-i386 :: overflow.c
SafeStack-Standalone-x86_64 :: overflow.c
```
This happens because `libclang_rt.ubsan_minimal.a` isn't built on
Solaris although it's required with `-fsanitize-minimal-runtime`.
This patch fixes this.
Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.
On sufficiently old versions of the Arm architecture, the optimized FP
routines are not enabled. So commit a84ee1416b6c179 should not have
enabled the extra-strict tests that go with them.
Also in that commit, I wrote a comment saying I was setting two separate
compile-time definitions (-DCOMPILER_RT_ARM_OPTIMIZED_FP and
-DCOMPILER_RT_ARM_OPTIMIZED_FP_THUMB1), and then didn't actually do it!
This caused the strict mulsf3 tests to be wrongly disabled in Thumb2.
Both of these tests will cause an unsuccessful pass when using msvc.
`shadowed-stack-serialization.cpp` - XFAIL due to the metadata not being
generated.
`fakeframe-right-redzone.cpp` - UNSUPPORTED due to the optimization
limitations of the msvc compiler.
---------
Co-authored-by: MacGyver Codilla <mcodilla@microsoft.com>
You can pass the stdlib argument either as -stdlib and --stdlib - the
previous regex did not account for this however - which caused the build
to fail, as a --stdlib argument would be replaced with a single dash,
causing clang to assume reading from stdin and the build to fail:
clang++: error: -E or -x required when input is from standard input
clang++: error: cannot specify -o when generating multiple output files
The files
[libcxxabi/CMakeLists.txt](bf6986f9f0/libcxxabi/CMakeLists.txt (L261))
and
[libunwind/CMakeLists.txt](bf6986f9f0/libunwind/CMakeLists.txt (L257))
account for this by removing --stdlib first.
Co-authored-by: Vitaly Buka <vitalybuka@google.com>