9571 Commits

Author SHA1 Message Date
Thurston Dang
d33a2c5811
[BoundsSan] Update BoundsChecking.cpp to use no-merge attribute where applicable (#120620)
https://github.com/llvm/llvm-project/pull/65972 introduced
-ubsan-unique-traps and -bounds-checking-unique-traps, which attach the
function size to the ubsantrap intrinsic.

https://github.com/llvm/llvm-project/pull/117651 changed
ubsan-unique-traps to use nomerge instead of the function size, but did
not update -bounds-checking-unique-traps. This patch adds nomerge to
bounds-checking-unique-traps.
2024-12-19 13:31:29 -08:00
Florian Hahn
9e322c56f7
[TySan] Don't report globals with external storage. (#120565)
Globals with external storage should have been initialized where they
are defined.

Fixes https://github.com/llvm/llvm-project/issues/120448

PR: https://github.com/llvm/llvm-project/pull/120565
2024-12-19 21:30:56 +00:00
Thurston Dang
cb8a90b7d1
[ubsan] Remove -ubsan-unique-traps (replace with -fno-sanitize-merge) (#120613)
-fno-sanitize-merge (introduced in
https://github.com/llvm/llvm-project/pull/120511) duplicates the
functionality of -ubsan-unique-traps but also allows individual checks
to be specified e.g.,
* "-fno-sanitize-merge" without arguments is equivalent to
-ubsan-unique-traps
* "-fno-sanitize-merge=bool,enum" will apply it only to those two checks

Additionally, the naming is more consistent with the rest of the
-fsanitize- family.

This patch therefore removes -ubsan-unique-traps. This breaks backwards
compatibility; we hope that this is acceptable since '-mllvm
-ubsan-unique-traps' was an experimental flag.

This patch also adds negative test examples to bounds-checking.c, and
strengthens the NOOPTARRAY assertion to prevent spurious matches.

"-bounds-checking-unique-traps" is unaffected by this patch.
2024-12-19 12:53:48 -08:00
SpencerAbson
9469fd24b9
[Clang][AArch64] Remove const from base pointers in sve2p1 stores (#120551)
This patch removes the const qualifier from the base pointer argument of
`svst1wq`/`svst1wq_vnum` and `svst1dq`/`svst1dq_vnum`, in accordance
with https://github.com/ARM-software/acle/pull/359.
2024-12-19 14:13:02 +00:00
SpencerAbson
db84ae3a68
[Clang][AArch64] Add signed index/offset variants of sve2p1 qword stores (#120549)
This patch adds signed offset/index variants to the SVE2p1 quadword
store intrinsics, in accordance with
https://github.com/ARM-software/acle/pull/359.
2024-12-19 13:27:07 +00:00
Alexandros Lamprineas
6586c676b4
[FMV][AArch64] Emit mangled default version if explicitly specified. (#120022)
Currently we need at least one more version other than the default to
trigger FMV. However we would like a header file declaration

__attribute__((target_version("default"))) void f(void);

to guarantee that there will be f.default
2024-12-19 12:06:46 +00:00
Oliver Stannard
9fc2fadbfc
[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449)
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.

This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.

Fixes #111293.
2024-12-19 09:12:19 +00:00
Thurston Dang
ffff7bb582
Reapply "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120…464)" (#120511)
This reverts commit 2691b964150c77a9e6967423383ad14a7693095e. This
reapply fixes the buildbot breakage of the original patch, by updating
clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge
(the default, which is merge, is applied by the driver but not
clang_cc1).

This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c.

----

Original commit message:
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).

N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).

This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
2024-12-18 18:13:26 -08:00
Thurston Dang
2691b96415 Revert "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464)"
This reverts commit 7eaf4708098c216bf432fc7e0bc79c3771e793a4.

Reason: buildbot breakage (e.g.,
https://lab.llvm.org/buildbot/#/builders/144/builds/14299/steps/6/logs/FAIL__Clang__ubsan-trap-debugloc_c)
2024-12-18 23:50:01 +00:00
Thurston Dang
7eaf470809
[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464)
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).

N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).

This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
2024-12-18 15:36:12 -08:00
Alexander Kornienko
23a239267e
Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460)
Reverts llvm/llvm-project#119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).

Related discussions:
https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822
https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255
2024-12-18 19:06:34 +01:00
Florian Hahn
c135f6ffe2
[TySan] Add initial Type Sanitizer support to Clang) (#76260)
This patch introduces the Clang components of type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

The Clang changes are mostly formulaic, the one specific change being
that when the TBAA sanitizer is enabled, TBAA is always generated, even
at -O0.

It goes together with the corresponding LLVM changes
(https://github.com/llvm/llvm-project/pull/76259) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)

PR: https://github.com/llvm/llvm-project/pull/76260
2024-12-17 15:13:42 +00:00
SpencerAbson
908e30658d
[AArch64] Implement intrinsics for FP8 SME FMLAL/FMLALL (multi) (#119546)
This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long (multiple vectors).

``` c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");                              
```

In accordance with https://github.com/ARM-software/acle/pull/323
2024-12-17 11:47:20 +00:00
SpencerAbson
9c89b40f18
[AArch64] Implement intrinsics for FMLAL/FMLALL (single) (#119568)
Multi-vector 8-bit floating-point multiply-add long (single)
```c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla[_single]_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla[_single]_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");
 ```
 In accordance with https://github.com/ARM-software/acle/pull/323.
 
Co-authored-by: Momchil Velikov momchil.velikov@arm.com
2024-12-17 09:31:54 +00:00
Florian Mayer
514580b438
[MTE] Apply alignment / size in AsmPrinter rather than IR (#111918)
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
2024-12-17 00:47:02 -08:00
SpencerAbson
38099d0608
[AArch64] Implement intrinsics for SME FP8 FMLAL/FMLALL (Indexed) (#118549)
This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
  void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx
                                       fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
  void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");
```
In accordance with: https://github.com/ARM-software/acle/pull/323
2024-12-16 21:45:38 +00:00
Jonathan Thackray
8380bafaed
[AArch64] Add intrinsics for SME FP8 FVDOT, FVDOTB and FVDOTT intrinsics (#119922)
Add support for the following SME 8 bit floating-point dot-product
intrinsics:

```
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svvdot_lane_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                      svmfloat8_t zm, uint64_t imm_idx,
                                      fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
void svvdott_lane_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm) __arm_streaming __arm_inout("za");

void svvdotb_lane_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm) __arm_streaming __arm_inout("za");
```

---------

Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
Co-authored-by: Marian Lukac <marian.lukac@arm.com>
2024-12-16 14:42:45 +00:00
Jonathan Thackray
ef4b597015
[AArch64] Add intrinsics for SME FP8 FDOT single and multi instructions (#119845)
Add support for the following SME 8 bit floating-point dot-product intrinsics:

```
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svdot[_single]_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot[_single]_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
void svdot[_single]_za32[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot[_single]_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za32[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");
```

These intrinsics are extracted from:
https://github.com/ARM-software/acle/pull/323/

Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
Co-authored-by: Marian Lukac <marian.lukac@arm.com>
2024-12-16 13:14:40 +00:00
Daniil Kovalev
f65a21a4ec
[PAC][ELF][AArch64] Support signed personality function pointer (#119361)
Re-apply #113148 after revert in #119331

If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
2024-12-16 10:24:09 +03:00
Momchil Velikov
2eed88da6a
[AArch64] Implement FP8 SVE intrinsics for fused multiply-add (#118126)
This patch adds the following intrinsics:

* 8-bit floating-point multiply-add long to half-precision (bottom).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat16_t svmlalb[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat16_t svmlalb[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long to half-precision (bottom,
indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat16_t svmlalb_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                         uint64_t imm0_15, fpm_t fpm);

* 8-bit floating-point multiply-add long to half-precision (top).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat16_t svmlalt[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat16_t svmlalt[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long to half-precision (top,
indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat16_t svmlalt_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                         uint64_t imm0_15, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision
(bottom bottom).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlallbb[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svmlallbb[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision
(bottom bottom, indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlallbb_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t
zn, svmfloat8_t zm,
                                           uint64_t imm0_15, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision
(bottom top).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlallbt[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svmlallbt[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision
(bottom top, indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlallbt_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t
zn, svmfloat8_t zm,
                                           uint64_t imm0_15, fpm_t fpm);
* 8-bit floating-point multiply-add long long to single-precision (top
bottom).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlalltb[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svmlalltb[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision (top
bottom, indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlalltb_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t
zn, svmfloat8_t zm,
                                           uint64_t imm0_15, fpm_t fpm);
* 8-bit floating-point multiply-add long long to single-precision (top
top).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlalltt[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svmlalltt[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision (top
top, indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlalltt_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t
zn, svmfloat8_t zm,
                                           uint64_t imm0_15, fpm_t fpm);
2024-12-13 21:05:27 +00:00
Momchil Velikov
c2172431c7
[AArch64] Implements FP8 SVE intrinsics for dot-product (#118125)
This patch adds the following intrinsics:

* 8-bit floating-point dot product to single-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) ||
__ARM_FEATURE_SSVE_FP8DOT4
svfloat32_t svdot[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svdot[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point indexed dot product to single-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) ||
__ARM_FEATURE_SSVE_FP8DOT4
svfloat32_t svdot_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                       uint64_t imm0_3, fpm_t fpm);

* 8-bit floating-point dot product to half-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) ||
__ARM_FEATURE_SSVE_FP8DOT2
svfloat16_t svdot[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat16_t svdot[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point indexed dot product to half-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) ||
__ARM_FEATURE_SSVE_FP8DOT2
svfloat16_t svdot_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                       uint64_t imm0_7, fpm_t fpm);
2024-12-13 14:06:54 +00:00
Nikita Popov
a30e50fcb3
[BasicAA] Do not decompose past casts with different index width (#119365)
BasicAA currently tries to support addrspacecasts that change the index
width by performing the decomposition in the maximum of all index widths
and then trying to fix this up with in-place sign extends to get correct
overflow behavior if the actual index width is smaller.

However, even in the case where we don't mix different index widths and
just have an index width that is smaller than the maximum, the behavior
is incorrect (see test), because we only perform the index width
adjustment during decomposition and not any of the later logic -- and we
don't do anything at all for variable offsets. I'm sure that the case
where we actually mix different index widths is even more broken than
that.

Fix this by not allowing decomposition through index width changes. If
the pointers have different index widths, fall back to a base object
comparison, ignoring the offsets.
2024-12-13 12:58:59 +01:00
Jonathan Thackray
1fd3d1d04e
[AArch64] Add intrinsics for SME FP8 FDOT LANE instructions (#118492)
Add support for the following SME 8 bit floating-point dot-product
intrinsics:
* void svdot_lane_za16_mf8_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm);
* void svdot_lane_za16_mf8_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm);
* void svdot_lane_za32_mf8_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm);
* void svdot_lane_za32_mf8_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm);

---------

Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
Co-authored-by: Marian Lukac <marian.lukac@arm.com>
Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
Co-authored-by: SpencerAbson <Spencer.Abson@arm.com>
2024-12-13 09:09:36 +00:00
Alexandros Lamprineas
6f013dbced
[AArch64][FMV] Add missing feature dependencies and detect at runtime. (#119231)
i8mm -> simd
fp16fml -> simd
frintts -> fp
bf16 -> simd
sme -> fp16

Approved in ACLE as https://github.com/ARM-software/acle/pull/368
2024-12-11 22:11:32 +00:00
Momchil Velikov
b1d8c60dd4
[AArch64] Implement FP8 SVE Intrinsics for narrowing conversions (#118124)
This patch adds the following instrinsics:

* Half-precision and BFloat16 convert, narrow, and interleave to 8-bit
floating-point.

  // Variant is also available for: _bf16_x2
  svmfloat8_t svcvtn_mf8[_f16_x2]_fpm(svfloat16x2_t zn, fpm_t fpm);

* Single-precision convert, narrow, and interleave to 8-bit
floating-point (top and bottom).

svmfloat8_t svcvtnt_mf8[_f32_x2]_fpm(svmfloat8_t zd, svfloat32x2_t zn,
fpm_t fpm);
  svmfloat8_t svcvtnb_mf8[_f32_x2]_fpm(svfloat32x2_t zn, fpm_t fpm);
2024-12-11 13:37:15 +00:00
SpencerAbson
b0763a472b
[AArch64] Implement intrinsics for FP8 FCVT/FCVTN/BFCVT (#118025)
This patch implements the following intrinsics:

Convert to packed 8-bit floating-point format.
``` c
  // Variants are also available for: _mf8[_bf16_x2] and _mf8[_f32_x4]
  svmfloat8_t svcvt_mf8[_f16_x2]_fpm(svfloat16x2_t zn, fpm_t fpm) __arm_streaming;
```
Convert to interleaved 8-bit floating-point format.
``` c
  svmfloat8_t svcvtn_mf8[_f32_x4]_fpm(svfloat32x4_t zn, fpm_t fpm) __arm_streaming;
```
In accordance with https://github.com/ARM-software/acle/pull/323.

Co-authored-by: Marin Lukac marian.lukac@arm.com
Co-authored-by: Caroline Concatto caroline.concatto@arm.com
2024-12-11 09:17:43 +00:00
Thurston Dang
67bd04facf
[ubsan] Don't merge non-trap handlers if -ubsan-unique-traps or not optimized (#119302)
UBSan handler calls are sometimes merged by the backend, which complicates debugging. Merging is currently disabled for UBSan traps if -ubsan-unique-traps is specified or if optimization is disabled. This patch applies the same policy to non-trap handler calls.

N.B. "-ubsan-unique-traps" becomes somewhat of a misnomer since it will now apply to non-trap handler calls as well as traps; nonetheless, we keep the naming for backwards compatibility.
2024-12-10 15:25:24 -08:00
anoopkg6
dc04d414df
SystemZ: Add support for __builtin_setjmp and __builtin_longjmp. (#119257)
This pr includes fixes for original pr##116642.
Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ..
2024-12-10 19:50:51 +01:00
Dan Gohman
c5ab70c508
[WebAssembly] Add -i128:128 to the datalayout string. (#119204)
Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM
`datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is
currently using the defaults for both, so it's inconsistent. Fix this by
adding `-i128:128` to Wasm's `datalayout` string so that it aligns
`i128` to 16 bytes too.

This is similar to
[llvm/llvm-project@dbad963](dbad963a69)
for SPARC.

This fixes rust-lang/rust#133991; see that issue for further discussion.

[defaults to aligning `__int128_t` to 16 bytes]:
f8b4182f07/clang/lib/Basic/TargetInfo.cpp (L77)
[default to aligning `i128` to 8 bytes]:
https://llvm.org/docs/LangRef.html#langref-datalayout
2024-12-10 09:21:58 -08:00
Pedro Lobo
f28e52274c
[Clang] Change two placeholders from undef to poison [NFC] (#119141)
- Use `poison` instead of `undef` as a phi operand for an unreachable path (the predecessor
will not go the BB that uses the value of the phi).
- Call `@llvm.vector.insert` with a `poison` subvec when performing a
`bitcast` from a fixed vector to a scalable vector.
2024-12-10 15:57:55 +00:00
Momchil Velikov
cc1a2ea61e
[AArch64] Implement FP8 SVE intrinsics for widening conversions (#118123)
This patch adds the following intrinsics:
* 8-bit floating-point convert to half-precision and BFloat16.

  // Variants are also available for: _bf16
  svfloat16_t svcvt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
  svfloat16_t svcvt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);

* 8-bit floating-point convert to half-precision and BFloat16 (top).

  // Variants are also available for: _bf16
  svfloat16_t svcvtlt1_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
  svfloat16_t svcvtlt2_f16[_mf8]_fpm(svmfloat8_t zn, fpm_t fpm);
2024-12-10 13:32:05 +00:00
Nikita Popov
e21ab4d16b
[InstCombine] Infer nuw for gep inbounds from base of object (#119225)
When we have a gep inbounds from the base of an object (e.g. alloca or
global), we know that the index cannot be negative, as this would go out
of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also
accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently
incorrectly doesn't require the inbounds for the alloca case, see
https://github.com/AliveToolkit/alive2/issues/1138).
2024-12-10 10:00:50 +01:00
Daniil Kovalev
ef2e590e7b
Revert "[PAC][ELF][AArch64] Support signed personality function pointer" (#119331)
Reverts llvm/llvm-project#113148

See buildbot failure
https://lab.llvm.org/buildbot/#/builders/190/builds/11048
2024-12-10 09:12:25 +03:00
Daniil Kovalev
4fb1cda660
[PAC][ELF][AArch64] Support signed personality function pointer (#113148)
If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
2024-12-10 08:48:09 +03:00
Jordan Rupprecht
a6b5e18fc6
[test][clang][AArch64] Don't assume current dir is writeable (#119285)
afa2fbf87a8e3fff609fd325c938929c48e94280 adds a test which can fail with
`error: unable to open output file 'fixed-register-global.o':
'Permission denied'`. We don't check the output file at all, so just use
/dev/null.
2024-12-09 20:33:13 -06:00
Thurston Dang
fd57946cc4
[NFC][clang] Update ubsan-trap-merge.c test to show absence of nomerge in non-trap mode (#119280)
This shows that ubsan handlers do not have nomerge attributes in
non-trap mode, even if -ubsan-unique-trap is enabled.

0d15d46362bd6ab5a9a2165805adaab13a7689f4 attaches nomerge but only for
trap mode.

---------

Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
2024-12-09 16:21:22 -08:00
Lei Huang
a13ec9cd54
[PowerPC] Update data layout aligment of i128 to 16 (#118004)
Fix 64-bit PowerPC part of
https://github.com/llvm/llvm-project/issues/102783.
2024-12-09 18:02:24 -05:00
Nikita Popov
10f315dc9c
[ConstantFolding] Infer getelementptr nuw flag (#119214)
Infer nuw from nusw and nneg. This is the constant expression variant of
https://github.com/llvm/llvm-project/pull/111144.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-09 16:44:05 +01:00
SpencerAbson
99f6ca9b7b
[AArch64] Implement intrinsics for SME FP8 FMOPA (#118115)
This patch implements the following intrinsics:

8-bit floating-point sum of outer products and accumulate.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
    void svmopa_za16[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm,
                                 svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm)
                                 __arm_streaming __arm_inout("za");

  // Only if __ARM_FEATURE_SME_F8F32 != 0
    void svmopa_za32[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm,
                                 svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm)
                                 __arm_streaming __arm_inout("za");
```

In accordance with: https://github.com/ARM-software/acle/pull/323/

Co-authored-by: Momchil Velikov momchil.velikov@arm.com
Co-authored-by: Marian Lukac marian.lukac@arm.com
2024-12-09 11:13:08 +00:00
c8ef
f145ff3f70
[clang] constexpr built-in elementwise add_sat/sub_sat functions. (#119082)
Part of #51787.

This patch adds constexpr support for the built-in elementwise add_sat
and sub_sat functions.
2024-12-09 09:28:12 +08:00
SpencerAbson
b0f06769e6
[AArch64] Implement intrinsics for SME FP8 F1CVT/F2CVT and BF1CVT/BF2CVT (#118027)
This patch implements the following intrinsics:

8-bit floating-point convert to half-precision or BFloat16 (in-order).
``` c
  // Variant is also available for: _bf16[_mf8]_x2
  svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
  svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```

In accordance with https://github.com/ARM-software/acle/pull/323.

Co-authored-by: Marin Lukac marian.lukac@arm.com
Co-authored-by: Caroline Concatto caroline.concatto@arm.com
2024-12-08 19:34:01 +00:00
Vitaly Buka
7787328dd6
[ubsan] Improve lowering of @llvm.allow.ubsan.check (#119013)
This fix the case, when single hot inlined callsite, prevent
checks for all other. This helps to reduce number of removed checks up
to 50% (deppedes on `cutoff-hot` value) .

`ScalarOptimizerLateEPCallback` was happening during
CGSCC walk, after each inlining, but this is effectively
after inlining.

Example, order in comments:

```
static void overflow() {
  // 1. Inline get/set if possible
  // 2. Simplify
  // 3. LowerAllowCheckPass
  set(get() + get());
}

void test() {
  // 4. Inline
  // 5. Nothing for LowerAllowCheckPass
  overflow();
}
```

With this patch it will look like:
```
static void overflow() {
  // 1. Inline get/set if possible
  // 2. Simplify
  set(get() + get());
}

void test() {
  // 3. Inline
  // 4. Simplify
  overflow();
}

// Later, after inliner CGSCC walk complete:
// 5. LowerAllowCheckPass for `overflow`
// 6. LowerAllowCheckPass for `test`
```
2024-12-07 16:12:58 -08:00
Vitaly Buka
66f9448b4b
[NFC][ubsan] Pre-commit test with missed optimization (#119012) 2024-12-07 14:50:19 -08:00
Igor Kudrin
afa2fbf87a [Reland][clang][AArch64] Avoid a crash when a non-reserved register is used (#117419)
Relanding the patch with a fix for a test failure on build bots that do
not build LLVM for AArch64.

Fixes #76426, #109778 (for AArch64)

The previous patch for this issue, #94271, generated an error message if
a register and a global variable did not have the same size. This patch
checks if the register is reserved.
2024-12-06 16:13:36 -08:00
Ulrich Weigand
8787bc72a6 Revert "[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642)"
This reverts commit 030bbc92a705758f1131fb29cab5be6d6a27dd1f.
2024-12-07 00:55:54 +01:00
Igor Kudrin
da65fe1c16 Revert "[clang][AArch64] Avoid a crash when a non-reserved register is used (#117419)"
This reverts commit 8fc6fca9f28ce20d76066be66fcc41aa38f7dc3d.
2024-12-06 15:10:40 -08:00
Igor Kudrin
8fc6fca9f2
[clang][AArch64] Avoid a crash when a non-reserved register is used (#117419)
Fixes #76426, #109778 (for AArch64)

The previous patch for this issue, #94271, generated an error message if
a register and a global variable did not have the same size. This patch
checks if the register is reserved.
2024-12-06 14:58:10 -08:00
anoopkg6
030bbc92a7
[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642)
Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ.
2024-12-06 23:33:33 +01:00
Nikita Popov
b569ec6de6
[SCCP] Infer nuw for gep nusw with non-negative offsets (#118819)
If the GEP is nusw/inbounds and has all-non-negative offsets infer nuw
as well.

This doesn't have measurable compile-time impact.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-06 09:52:32 +01:00
Oliver Stannard
f893b47500
[ARM] Fix instruction selection for MVE vsbciq intrinsic (#118284)
There were two bugs in the implementation of the MVE vsbciq (subtract
with carry across vector, with initial carry value) intrinsics:
* The VSBCI instruction behaves as if the carry-in is always set, but we
were selecting it when the carry-in is clear.
* The vsbciq intrinsics should generate IR with the carry-in set, but
they were leaving it clear.

These two bugs almost cancelled each other out, but resulted in
incorrect code when the vsbcq intrinsics (with a carry-in) were used,
and the carry-in was a compile time constant.
2024-12-06 08:46:56 +00:00