https://github.com/llvm/llvm-project/pull/65972 introduced
-ubsan-unique-traps and -bounds-checking-unique-traps, which attach the
function size to the ubsantrap intrinsic.
https://github.com/llvm/llvm-project/pull/117651 changed
ubsan-unique-traps to use nomerge instead of the function size, but did
not update -bounds-checking-unique-traps. This patch adds nomerge to
bounds-checking-unique-traps.
-fno-sanitize-merge (introduced in
https://github.com/llvm/llvm-project/pull/120511) duplicates the
functionality of -ubsan-unique-traps but also allows individual checks
to be specified e.g.,
* "-fno-sanitize-merge" without arguments is equivalent to
-ubsan-unique-traps
* "-fno-sanitize-merge=bool,enum" will apply it only to those two checks
Additionally, the naming is more consistent with the rest of the
-fsanitize- family.
This patch therefore removes -ubsan-unique-traps. This breaks backwards
compatibility; we hope that this is acceptable since '-mllvm
-ubsan-unique-traps' was an experimental flag.
This patch also adds negative test examples to bounds-checking.c, and
strengthens the NOOPTARRAY assertion to prevent spurious matches.
"-bounds-checking-unique-traps" is unaffected by this patch.
This patch removes the const qualifier from the base pointer argument of
`svst1wq`/`svst1wq_vnum` and `svst1dq`/`svst1dq_vnum`, in accordance
with https://github.com/ARM-software/acle/pull/359.
Currently we need at least one more version other than the default to
trigger FMV. However we would like a header file declaration
__attribute__((target_version("default"))) void f(void);
to guarantee that there will be f.default
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.
This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.
Fixes#111293.
This reverts commit 2691b964150c77a9e6967423383ad14a7693095e. This
reapply fixes the buildbot breakage of the original patch, by updating
clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge
(the default, which is merge, is applied by the driver but not
clang_cc1).
This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c.
----
Original commit message:
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).
N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).
This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).
N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).
This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
Re-apply #113148 after revert in #119331
If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
BasicAA currently tries to support addrspacecasts that change the index
width by performing the decomposition in the maximum of all index widths
and then trying to fix this up with in-place sign extends to get correct
overflow behavior if the actual index width is smaller.
However, even in the case where we don't mix different index widths and
just have an index width that is smaller than the maximum, the behavior
is incorrect (see test), because we only perform the index width
adjustment during decomposition and not any of the later logic -- and we
don't do anything at all for variable offsets. I'm sure that the case
where we actually mix different index widths is even more broken than
that.
Fix this by not allowing decomposition through index width changes. If
the pointers have different index widths, fall back to a base object
comparison, ignoring the offsets.
This patch adds the following instrinsics:
* Half-precision and BFloat16 convert, narrow, and interleave to 8-bit
floating-point.
// Variant is also available for: _bf16_x2
svmfloat8_t svcvtn_mf8[_f16_x2]_fpm(svfloat16x2_t zn, fpm_t fpm);
* Single-precision convert, narrow, and interleave to 8-bit
floating-point (top and bottom).
svmfloat8_t svcvtnt_mf8[_f32_x2]_fpm(svmfloat8_t zd, svfloat32x2_t zn,
fpm_t fpm);
svmfloat8_t svcvtnb_mf8[_f32_x2]_fpm(svfloat32x2_t zn, fpm_t fpm);
This patch implements the following intrinsics:
Convert to packed 8-bit floating-point format.
``` c
// Variants are also available for: _mf8[_bf16_x2] and _mf8[_f32_x4]
svmfloat8_t svcvt_mf8[_f16_x2]_fpm(svfloat16x2_t zn, fpm_t fpm) __arm_streaming;
```
Convert to interleaved 8-bit floating-point format.
``` c
svmfloat8_t svcvtn_mf8[_f32_x4]_fpm(svfloat32x4_t zn, fpm_t fpm) __arm_streaming;
```
In accordance with https://github.com/ARM-software/acle/pull/323.
Co-authored-by: Marin Lukac marian.lukac@arm.com
Co-authored-by: Caroline Concatto caroline.concatto@arm.com
UBSan handler calls are sometimes merged by the backend, which complicates debugging. Merging is currently disabled for UBSan traps if -ubsan-unique-traps is specified or if optimization is disabled. This patch applies the same policy to non-trap handler calls.
N.B. "-ubsan-unique-traps" becomes somewhat of a misnomer since it will now apply to non-trap handler calls as well as traps; nonetheless, we keep the naming for backwards compatibility.
Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM
`datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is
currently using the defaults for both, so it's inconsistent. Fix this by
adding `-i128:128` to Wasm's `datalayout` string so that it aligns
`i128` to 16 bytes too.
This is similar to
[llvm/llvm-project@dbad963](dbad963a69)
for SPARC.
This fixesrust-lang/rust#133991; see that issue for further discussion.
[defaults to aligning `__int128_t` to 16 bytes]:
f8b4182f07/clang/lib/Basic/TargetInfo.cpp (L77)
[default to aligning `i128` to 8 bytes]:
https://llvm.org/docs/LangRef.html#langref-datalayout
- Use `poison` instead of `undef` as a phi operand for an unreachable path (the predecessor
will not go the BB that uses the value of the phi).
- Call `@llvm.vector.insert` with a `poison` subvec when performing a
`bitcast` from a fixed vector to a scalable vector.
This patch adds the following intrinsics:
* 8-bit floating-point convert to half-precision and BFloat16.
// Variants are also available for: _bf16
svfloat16_t svcvt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
svfloat16_t svcvt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
* 8-bit floating-point convert to half-precision and BFloat16 (top).
// Variants are also available for: _bf16
svfloat16_t svcvtlt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
svfloat16_t svcvtlt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
When we have a gep inbounds from the base of an object (e.g. alloca or
global), we know that the index cannot be negative, as this would go out
of bounds. As such, we can infer nuw as well.
The implementation is a bit stricter than necessary, we could also
accept one unknown index followed by known-non-negative indices.
Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently
incorrectly doesn't require the inbounds for the alloca case, see
https://github.com/AliveToolkit/alive2/issues/1138).
If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
afa2fbf87a8e3fff609fd325c938929c48e94280 adds a test which can fail with
`error: unable to open output file 'fixed-register-global.o':
'Permission denied'`. We don't check the output file at all, so just use
/dev/null.
This shows that ubsan handlers do not have nomerge attributes in
non-trap mode, even if -ubsan-unique-trap is enabled.
0d15d46362bd6ab5a9a2165805adaab13a7689f4 attaches nomerge but only for
trap mode.
---------
Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
This patch implements the following intrinsics:
8-bit floating-point convert to half-precision or BFloat16 (in-order).
``` c
// Variant is also available for: _bf16[_mf8]_x2
svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```
In accordance with https://github.com/ARM-software/acle/pull/323.
Co-authored-by: Marin Lukac marian.lukac@arm.com
Co-authored-by: Caroline Concatto caroline.concatto@arm.com
This fix the case, when single hot inlined callsite, prevent
checks for all other. This helps to reduce number of removed checks up
to 50% (deppedes on `cutoff-hot` value) .
`ScalarOptimizerLateEPCallback` was happening during
CGSCC walk, after each inlining, but this is effectively
after inlining.
Example, order in comments:
```
static void overflow() {
// 1. Inline get/set if possible
// 2. Simplify
// 3. LowerAllowCheckPass
set(get() + get());
}
void test() {
// 4. Inline
// 5. Nothing for LowerAllowCheckPass
overflow();
}
```
With this patch it will look like:
```
static void overflow() {
// 1. Inline get/set if possible
// 2. Simplify
set(get() + get());
}
void test() {
// 3. Inline
// 4. Simplify
overflow();
}
// Later, after inliner CGSCC walk complete:
// 5. LowerAllowCheckPass for `overflow`
// 6. LowerAllowCheckPass for `test`
```
Relanding the patch with a fix for a test failure on build bots that do
not build LLVM for AArch64.
Fixes#76426, #109778 (for AArch64)
The previous patch for this issue, #94271, generated an error message if
a register and a global variable did not have the same size. This patch
checks if the register is reserved.
Fixes#76426, #109778 (for AArch64)
The previous patch for this issue, #94271, generated an error message if
a register and a global variable did not have the same size. This patch
checks if the register is reserved.
If the GEP is nusw/inbounds and has all-non-negative offsets infer nuw
as well.
This doesn't have measurable compile-time impact.
Proof: https://alive2.llvm.org/ce/z/ihztLy
There were two bugs in the implementation of the MVE vsbciq (subtract
with carry across vector, with initial carry value) intrinsics:
* The VSBCI instruction behaves as if the carry-in is always set, but we
were selecting it when the carry-in is clear.
* The vsbciq intrinsics should generate IR with the carry-in set, but
they were leaving it clear.
These two bugs almost cancelled each other out, but resulted in
incorrect code when the vsbcq intrinsics (with a carry-in) were used,
and the carry-in was a compile time constant.