9599 Commits

Author SHA1 Message Date
Sander de Smalen
b4ce29ab31
[AArch64][Clang] Add support for __arm_agnostic("sme_za_state") (#121788)
This adds support for parsing the attribute and codegen to map it to
"aarch64_za_state_agnostic" LLVM IR attribute.

This attribute is described in the Arm C Language Extensions (ACLE)
document:

  https://github.com/ARM-software/acle/blob/main/main/acle.md#__arm_agnostic
2025-01-12 21:35:44 +00:00
Vitaly Buka
8af4d206e0
[NFCI][BoundsChecking] Apply nosanitize on local-bounds instrumentation (#122416)
Should be NFCI as we run sanitizer, like msan, before local-bounds.
2025-01-10 18:11:19 -08:00
Vitaly Buka
99d0780f05
[nfc][ubsan] Add local-bounds test (#122415)
Show that @llvm.allow.ubsan.check is not used yet.
2025-01-10 17:47:35 -08:00
Vitaly Buka
834d65eb2e
[nfc][ubsan] Use O1 in test to remove more unrelated stuff (#122408) 2025-01-10 16:41:43 -08:00
Alexandros Lamprineas
b93ffa8e4a
[FMV][AArch64] Changes in fmv-features metadata. (#122192)
* We want the default version to have this attribute too otherwise it
becomes indistinguishable from non-versioned functions.

* We don't need the '+' unlike target-features which can negate. This
will allow using the parsing API of target_version/clones for the
metadata too.
2025-01-10 17:50:35 +00:00
Evgenii Kudriashov
b43c97c2dd
[Headers][X86] amxintrin.h - fix attributes according to Intel SDM (#122204)
`tileloadd`, `tileloaddt1` and `tilestored` are part of `amx-tile`
feature.

The problem is observed if `__tile_loadd` intrinsic is invoked,
`_tile_loadd_internal` requiring `amx-int8` is inlined into
`__tile_loadd` that has only `amx-tile`.
2025-01-10 17:52:09 +01:00
Vitaly Buka
a4394d9d42 [NFC][ubsan] Rename prefixes in test
Looks like update_cc_test_checks is being confused if
it creates vars with the name matching prefix.

Issue triggered with #122415
2025-01-09 21:12:36 -08:00
Nikita Popov
4847395c54
[Clang] Adjust pointer-overflow sanitizer for N3322 (#120719)
N3322 makes NULL + 0 well-defined in C, matching the C++ semantics.
Adjust the pointer-overflow sanitizer to no longer report NULL + 0 as a
pointer overflow in any language mode. NULL + nonzero will of course
continue to be reported.

As N3322 is part of
https://www.open-std.org/jtc1/sc22/wg14/www/previous.html, and we never
performed any optimizations based on NULL + 0 being undefined in the
first place, I'm applying this change to all C versions.
2025-01-09 09:23:23 +01:00
Alexandros Lamprineas
8e65940161
[FMV][AArch64] Simplify version selection according to ACLE. (#121921)
Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/acle/pull/370 as
follows:

"Among any two versions, the higher priority version is determined by
 identifying the highest priority feature that is specified in exactly
 one of the versions, and selecting that version."
2025-01-08 18:59:07 +00:00
Florian Hahn
346fad5c2c
[TBAA] Simplify checks for unnamed struct case, where anyptr is used. 2025-01-08 14:08:28 +00:00
Florian Hahn
9fc152d25e
[TBAA] Add Clang pointer TBAA test with void *. 2025-01-08 11:54:02 +00:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
Florian Hahn
473cdb93e5
[TySan] Don't report globals with incomplete types. (#121922)
Type metadata for incomplete types should also get handled at the place
they are defined.

Fixes https://github.com/llvm/llvm-project/issues/121014.


PR: https://github.com/llvm/llvm-project/pull/121922
2025-01-07 15:06:26 +00:00
天音あめ
ca5fd06366
[clang] Fix crashes when passing VLA to va_arg (#119563)
Closes #119360.

This bug occurs when passing a VLA to `va_arg`. Since the return value
is inferred to be an array, it triggers
`ScalarExprEmitter::VisitCastExpr`, which converts it to a pointer and
subsequently calls `CodeGenFunction::EmitAggExpr`. At this point,
because the inferred type is an `AggExpr` instead of a `ScalarExpr`,
`ScalarExprEmitter::VisitVAArgExpr` is not invoked, and as a result,
`CodeGenFunction::EmitVariablyModifiedType` is also not called, leading
to the size of the VLA not being retrieved.
The solution is to move the call to
`CodeGenFunction::EmitVariablyModifiedType` into
`CodeGenFunction::EmitVAArg`, ensuring that the size of the VLA is
correctly obtained regardless of whether the expression is an `AggExpr`
or a `ScalarExpr`.
2025-01-07 07:49:43 -05:00
Nicholas Guy
21b531ead1
[clang][llvm][aarch64] Add aarch64_sme_in_streaming_mode intrinsic (#120265)
Replacing the extant streaming mode function call with an intrinsic
allows us to make further optimisations around it. For example, if it's
called within a function that has a known streaming mode, we can remove
the dead code, and avoid the redundant conditional branch.
2025-01-07 09:02:26 +00:00
Alexandros Lamprineas
93011fe2a5
[FMV][AArch64][clang] Emit fmv-features metadata in LLVM IR. (#118544)
We need to be able to propagate information about FMV attribute strings
from C/C++ source to LLVM IR. This is necessary so that we can
distinguish which target-features are coming from the cmdline, which are
coming from the target attribute, and which are coming from feature
dependency expansion. We need this for static resolution of calls in
LLVM. Here's a motivating example:

Suppose you have target_version("i8mm+dotprod") and
target_version("fcma"). The first version clearly has higher priority.
Now suppose you specify -march=armv8-a+i8mm on the command line. Then
the versions would have target-features "+i8mm,+dotprod" and
"+i8mm,+fcma" respectively. If you are using those to deduce version
priority, then you would incorrectly deduce that the second version was
higher priority than the first.
2025-01-07 08:51:23 +00:00
David Green
ca603d2536 [AArch64] Regenerate neon-vcmla.c test. NFC
This removes -O1 from the opt pipeline, using just mem2reg,instsimplify
instead. The target is changed so that the auto update script will apply.
2025-01-06 16:26:41 +00:00
Joseph Huber
81fae0d5e3
[Clang][AMDGPU] Stop defaulting to one-as for all atomic scopes (#120095)
Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.
2025-01-06 08:11:08 -06:00
Kerry McLaughlin
d8d4c18761
[AArch64][SME] Disable inlining of callees with new ZT0 state (#121338)
Inlining must be disabled for new-ZT0 callees as the callee is required
to save ZT0 and toggle PSTATE.ZA on entry.
2025-01-06 12:02:28 +00:00
Benjamin Maxwell
e4e2f53693
[clang] Add sincos builtin using llvm.sincos intrinsic (#114086)
This registers `sincos[f|l]` as a clang builtin and updates GCBuiltin to
emit the `llvm.sincos.*` intrinsic when `-fno-math-errno` is set. Note:
`llvm.sincos.*` is only emitted by `__builtin_sincos[f|l]` functions in
this initial patch.
2025-01-06 11:07:25 +00:00
Fangrui Song
82fecab85a [gcov] Bump default version to 11.1
The gcov version is set to 11.1 (compatible with gcov 9) even if
`-Xclang -coverage-version=` specified version is less than 11.1.

Therefore, we can drop producer support for version < 11.1.
2025-01-02 23:01:28 -08:00
Timm Baeder
45e874e390
[clang][bytecode] Check for memcpy/memmove dummy pointers earlier (#121453) 2025-01-02 09:15:14 +01:00
Stephen Senran Zhang
2feffecb88
[ConstantRange] Estimate tighter lower (upper) bounds for masked binary and (or) (#120352)
Fixes #118108.

Co-author: Yingwei Zheng (@dtcxzyw)
2024-12-31 18:40:17 -08:00
Momchil Velikov
f70ab7d909
[AArch64] Fix argument passing for SVE tuples (#118961)
The fix for passing Pure Scalable Types
(https://github.com/llvm/llvm-project/pull/112747) was incomplete,
it didn't handle correctly tuples of SVE vectors (e.g. `sveboolx2_t`,
`svfloat32x4_t`, etc).

These types are Pure Scalable Types and should be passed either entirely
in vector registers
or indirectly in memory, not split.
2024-12-23 09:26:24 +00:00
Phoebe Wang
113177f98b
[X86][AVX10.2] Fix wrong mask bits in cvtpbf8_ph intrinsics (#120927)
Found during review #120766
2024-12-23 17:13:56 +08:00
Thurston Dang
5bb650345d
Remove -bounds-checking-unique-traps (replace with -fno-sanitize-merge=local-bounds) (#120682)
#120613 removed -ubsan-unique-traps and replaced it with
-fno-sanitize-merge (introduced in #120511), which allows fine-grained
control of which UBSan checks to prevent merging. This analogous patch
removes -bound-checking-unique-traps, and allows it to be controlled via
-fno-sanitize-merge=local-bounds.

Most of this patch is simply plumbing through the compiler flags into
the bounds checking pass.

Note: this patch subtly changes -fsanitize-merge (the default) to also
include -fsanitize-merge=local-bounds. This is different from the
previous behavior, where -fsanitize-merge (or the old
-ubsan-unique-traps) did not affect local-bounds (requiring the separate
-bounds-checking-unique-traps). However, we argue that the new behavior
is more intuitive.

Removing -bounds-checking-unique-traps and merging its functionality
into -fsanitize-merge breaks backwards compatibility; we hope that this
is acceptable since '-mllvm -bounds-checking-unique-traps' was an
experimental flag.
2024-12-20 10:07:44 -08:00
Mikhail Goncharov
93743ee566 Revert "[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449)"
This reverts commit 9fc2fadbfcb34df5f72bdaed28a7874bf584eed7.

See https://github.com/llvm/llvm-project/pull/120449#issuecomment-2556089016
2024-12-20 08:14:26 +01:00
Vitaly Buka
c2aee50620
[ubsan] Runtime and driver support for local-bounds (#120515)
Implements ``-f[no-]sanitize-trap=local-bounds``,
and ``-f[no-]sanitize-recover=local-bounds``.

LLVM part is here #120513.
2024-12-19 16:38:07 -08:00
Thurston Dang
d33a2c5811
[BoundsSan] Update BoundsChecking.cpp to use no-merge attribute where applicable (#120620)
https://github.com/llvm/llvm-project/pull/65972 introduced
-ubsan-unique-traps and -bounds-checking-unique-traps, which attach the
function size to the ubsantrap intrinsic.

https://github.com/llvm/llvm-project/pull/117651 changed
ubsan-unique-traps to use nomerge instead of the function size, but did
not update -bounds-checking-unique-traps. This patch adds nomerge to
bounds-checking-unique-traps.
2024-12-19 13:31:29 -08:00
Florian Hahn
9e322c56f7
[TySan] Don't report globals with external storage. (#120565)
Globals with external storage should have been initialized where they
are defined.

Fixes https://github.com/llvm/llvm-project/issues/120448

PR: https://github.com/llvm/llvm-project/pull/120565
2024-12-19 21:30:56 +00:00
Thurston Dang
cb8a90b7d1
[ubsan] Remove -ubsan-unique-traps (replace with -fno-sanitize-merge) (#120613)
-fno-sanitize-merge (introduced in
https://github.com/llvm/llvm-project/pull/120511) duplicates the
functionality of -ubsan-unique-traps but also allows individual checks
to be specified e.g.,
* "-fno-sanitize-merge" without arguments is equivalent to
-ubsan-unique-traps
* "-fno-sanitize-merge=bool,enum" will apply it only to those two checks

Additionally, the naming is more consistent with the rest of the
-fsanitize- family.

This patch therefore removes -ubsan-unique-traps. This breaks backwards
compatibility; we hope that this is acceptable since '-mllvm
-ubsan-unique-traps' was an experimental flag.

This patch also adds negative test examples to bounds-checking.c, and
strengthens the NOOPTARRAY assertion to prevent spurious matches.

"-bounds-checking-unique-traps" is unaffected by this patch.
2024-12-19 12:53:48 -08:00
SpencerAbson
9469fd24b9
[Clang][AArch64] Remove const from base pointers in sve2p1 stores (#120551)
This patch removes the const qualifier from the base pointer argument of
`svst1wq`/`svst1wq_vnum` and `svst1dq`/`svst1dq_vnum`, in accordance
with https://github.com/ARM-software/acle/pull/359.
2024-12-19 14:13:02 +00:00
SpencerAbson
db84ae3a68
[Clang][AArch64] Add signed index/offset variants of sve2p1 qword stores (#120549)
This patch adds signed offset/index variants to the SVE2p1 quadword
store intrinsics, in accordance with
https://github.com/ARM-software/acle/pull/359.
2024-12-19 13:27:07 +00:00
Alexandros Lamprineas
6586c676b4
[FMV][AArch64] Emit mangled default version if explicitly specified. (#120022)
Currently we need at least one more version other than the default to
trigger FMV. However we would like a header file declaration

__attribute__((target_version("default"))) void f(void);

to guarantee that there will be f.default
2024-12-19 12:06:46 +00:00
Oliver Stannard
9fc2fadbfc
[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449)
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.

This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.

Fixes #111293.
2024-12-19 09:12:19 +00:00
Thurston Dang
ffff7bb582
Reapply "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120…464)" (#120511)
This reverts commit 2691b964150c77a9e6967423383ad14a7693095e. This
reapply fixes the buildbot breakage of the original patch, by updating
clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge
(the default, which is merge, is applied by the driver but not
clang_cc1).

This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c.

----

Original commit message:
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).

N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).

This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
2024-12-18 18:13:26 -08:00
Thurston Dang
2691b96415 Revert "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464)"
This reverts commit 7eaf4708098c216bf432fc7e0bc79c3771e793a4.

Reason: buildbot breakage (e.g.,
https://lab.llvm.org/buildbot/#/builders/144/builds/14299/steps/6/logs/FAIL__Clang__ubsan-trap-debugloc_c)
2024-12-18 23:50:01 +00:00
Thurston Dang
7eaf470809
[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464)
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).

N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).

This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
2024-12-18 15:36:12 -08:00
Alexander Kornienko
23a239267e
Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460)
Reverts llvm/llvm-project#119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).

Related discussions:
https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822
https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255
2024-12-18 19:06:34 +01:00
Florian Hahn
c135f6ffe2
[TySan] Add initial Type Sanitizer support to Clang) (#76260)
This patch introduces the Clang components of type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

The Clang changes are mostly formulaic, the one specific change being
that when the TBAA sanitizer is enabled, TBAA is always generated, even
at -O0.

It goes together with the corresponding LLVM changes
(https://github.com/llvm/llvm-project/pull/76259) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)

PR: https://github.com/llvm/llvm-project/pull/76260
2024-12-17 15:13:42 +00:00
SpencerAbson
908e30658d
[AArch64] Implement intrinsics for FP8 SME FMLAL/FMLALL (multi) (#119546)
This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long (multiple vectors).

``` c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");                              
```

In accordance with https://github.com/ARM-software/acle/pull/323
2024-12-17 11:47:20 +00:00
SpencerAbson
9c89b40f18
[AArch64] Implement intrinsics for FMLAL/FMLALL (single) (#119568)
Multi-vector 8-bit floating-point multiply-add long (single)
```c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla[_single]_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla[_single]_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");
 ```
 In accordance with https://github.com/ARM-software/acle/pull/323.
 
Co-authored-by: Momchil Velikov momchil.velikov@arm.com
2024-12-17 09:31:54 +00:00
Florian Mayer
514580b438
[MTE] Apply alignment / size in AsmPrinter rather than IR (#111918)
This makes sure no optimizations are applied that assume the
bigger alignment or size, which could be incorrect if we link
together with non-instrumented code.
2024-12-17 00:47:02 -08:00
SpencerAbson
38099d0608
[AArch64] Implement intrinsics for SME FP8 FMLAL/FMLALL (Indexed) (#118549)
This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
  void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx
                                       fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
  void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");
```
In accordance with: https://github.com/ARM-software/acle/pull/323
2024-12-16 21:45:38 +00:00
Jonathan Thackray
8380bafaed
[AArch64] Add intrinsics for SME FP8 FVDOT, FVDOTB and FVDOTT intrinsics (#119922)
Add support for the following SME 8 bit floating-point dot-product
intrinsics:

```
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svvdot_lane_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                      svmfloat8_t zm, uint64_t imm_idx,
                                      fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
void svvdott_lane_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm) __arm_streaming __arm_inout("za");

void svvdotb_lane_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm) __arm_streaming __arm_inout("za");
```

---------

Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
Co-authored-by: Marian Lukac <marian.lukac@arm.com>
2024-12-16 14:42:45 +00:00
Jonathan Thackray
ef4b597015
[AArch64] Add intrinsics for SME FP8 FDOT single and multi instructions (#119845)
Add support for the following SME 8 bit floating-point dot-product intrinsics:

```
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svdot[_single]_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot[_single]_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
void svdot[_single]_za32[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot[_single]_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm,
                                         fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za32[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svdot_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");
```

These intrinsics are extracted from:
https://github.com/ARM-software/acle/pull/323/

Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
Co-authored-by: Marian Lukac <marian.lukac@arm.com>
2024-12-16 13:14:40 +00:00
Daniil Kovalev
f65a21a4ec
[PAC][ELF][AArch64] Support signed personality function pointer (#119361)
Re-apply #113148 after revert in #119331

If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
2024-12-16 10:24:09 +03:00
Momchil Velikov
2eed88da6a
[AArch64] Implement FP8 SVE intrinsics for fused multiply-add (#118126)
This patch adds the following intrinsics:

* 8-bit floating-point multiply-add long to half-precision (bottom).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat16_t svmlalb[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat16_t svmlalb[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long to half-precision (bottom,
indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat16_t svmlalb_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                         uint64_t imm0_15, fpm_t fpm);

* 8-bit floating-point multiply-add long to half-precision (top).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat16_t svmlalt[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat16_t svmlalt[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long to half-precision (top,
indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat16_t svmlalt_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                         uint64_t imm0_15, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision
(bottom bottom).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlallbb[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svmlallbb[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision
(bottom bottom, indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlallbb_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t
zn, svmfloat8_t zm,
                                           uint64_t imm0_15, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision
(bottom top).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlallbt[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svmlallbt[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision
(bottom top, indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlallbt_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t
zn, svmfloat8_t zm,
                                           uint64_t imm0_15, fpm_t fpm);
* 8-bit floating-point multiply-add long long to single-precision (top
bottom).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlalltb[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svmlalltb[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision (top
bottom, indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlalltb_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t
zn, svmfloat8_t zm,
                                           uint64_t imm0_15, fpm_t fpm);
* 8-bit floating-point multiply-add long long to single-precision (top
top).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlalltt[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svmlalltt[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point multiply-add long long to single-precision (top
top, indexed).

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) ||
__ARM_FEATURE_SSVE_FP8FMA
svfloat32_t svmlalltt_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t
zn, svmfloat8_t zm,
                                           uint64_t imm0_15, fpm_t fpm);
2024-12-13 21:05:27 +00:00
Momchil Velikov
c2172431c7
[AArch64] Implements FP8 SVE intrinsics for dot-product (#118125)
This patch adds the following intrinsics:

* 8-bit floating-point dot product to single-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) ||
__ARM_FEATURE_SSVE_FP8DOT4
svfloat32_t svdot[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat32_t svdot[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point indexed dot product to single-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) ||
__ARM_FEATURE_SSVE_FP8DOT4
svfloat32_t svdot_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                       uint64_t imm0_3, fpm_t fpm);

* 8-bit floating-point dot product to half-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) ||
__ARM_FEATURE_SSVE_FP8DOT2
svfloat16_t svdot[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm, fpm_t fpm);
svfloat16_t svdot[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
mfloat8_t zm, fpm_t fpm);

* 8-bit floating-point indexed dot product to half-precision.

// Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) ||
__ARM_FEATURE_SSVE_FP8DOT2
svfloat16_t svdot_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn,
svmfloat8_t zm,
                                       uint64_t imm0_7, fpm_t fpm);
2024-12-13 14:06:54 +00:00
Nikita Popov
a30e50fcb3
[BasicAA] Do not decompose past casts with different index width (#119365)
BasicAA currently tries to support addrspacecasts that change the index
width by performing the decomposition in the maximum of all index widths
and then trying to fix this up with in-place sign extends to get correct
overflow behavior if the actual index width is smaller.

However, even in the case where we don't mix different index widths and
just have an index width that is smaller than the maximum, the behavior
is incorrect (see test), because we only perform the index width
adjustment during decomposition and not any of the later logic -- and we
don't do anything at all for variable offsets. I'm sure that the case
where we actually mix different index widths is even more broken than
that.

Fix this by not allowing decomposition through index width changes. If
the pointers have different index widths, fall back to a base object
comparison, ignoring the offsets.
2024-12-13 12:58:59 +01:00