llvm-project

Author	SHA1	Message	Date
Sander de Smalen	b4ce29ab31	[AArch64][Clang] Add support for __arm_agnostic("sme_za_state") (#121788 ) This adds support for parsing the attribute and codegen to map it to "aarch64_za_state_agnostic" LLVM IR attribute. This attribute is described in the Arm C Language Extensions (ACLE) document: https://github.com/ARM-software/acle/blob/main/main/acle.md#__arm_agnostic	2025-01-12 21:35:44 +00:00
Vitaly Buka	8af4d206e0	[NFCI][BoundsChecking] Apply nosanitize on local-bounds instrumentation (#122416 ) Should be NFCI as we run sanitizer, like msan, before local-bounds.	2025-01-10 18:11:19 -08:00
Vitaly Buka	99d0780f05	[nfc][ubsan] Add local-bounds test (#122415 ) Show that @llvm.allow.ubsan.check is not used yet.	2025-01-10 17:47:35 -08:00
Vitaly Buka	834d65eb2e	[nfc][ubsan] Use O1 in test to remove more unrelated stuff (#122408 )	2025-01-10 16:41:43 -08:00
Alexandros Lamprineas	b93ffa8e4a	[FMV][AArch64] Changes in fmv-features metadata. (#122192 ) * We want the default version to have this attribute too otherwise it becomes indistinguishable from non-versioned functions. * We don't need the '+' unlike target-features which can negate. This will allow using the parsing API of target_version/clones for the metadata too.	2025-01-10 17:50:35 +00:00
Evgenii Kudriashov	b43c97c2dd	[Headers][X86] amxintrin.h - fix attributes according to Intel SDM (#122204 ) `tileloadd`, `tileloaddt1` and `tilestored` are part of `amx-tile` feature. The problem is observed if `__tile_loadd` intrinsic is invoked, `_tile_loadd_internal` requiring `amx-int8` is inlined into `__tile_loadd` that has only `amx-tile`.	2025-01-10 17:52:09 +01:00
Vitaly Buka	a4394d9d42	[NFC][ubsan] Rename prefixes in test Looks like update_cc_test_checks is being confused if it creates vars with the name matching prefix. Issue triggered with #122415	2025-01-09 21:12:36 -08:00
Nikita Popov	4847395c54	[Clang] Adjust pointer-overflow sanitizer for N3322 (#120719 ) N3322 makes NULL + 0 well-defined in C, matching the C++ semantics. Adjust the pointer-overflow sanitizer to no longer report NULL + 0 as a pointer overflow in any language mode. NULL + nonzero will of course continue to be reported. As N3322 is part of https://www.open-std.org/jtc1/sc22/wg14/www/previous.html, and we never performed any optimizations based on NULL + 0 being undefined in the first place, I'm applying this change to all C versions.	2025-01-09 09:23:23 +01:00
Alexandros Lamprineas	8e65940161	[FMV][AArch64] Simplify version selection according to ACLE. (#121921 ) Currently, the more features a version has, the higher its priority is. We are changing ACLE https://github.com/ARM-software/acle/pull/370 as follows: "Among any two versions, the higher priority version is determined by identifying the highest priority feature that is specified in exactly one of the versions, and selecting that version."	2025-01-08 18:59:07 +00:00
Florian Hahn	346fad5c2c	[TBAA] Simplify checks for unnamed struct case, where anyptr is used.	2025-01-08 14:08:28 +00:00
Florian Hahn	9fc152d25e	[TBAA] Add Clang pointer TBAA test with void *.	2025-01-08 11:54:02 +00:00
Alex MacLean	4583f6d344	[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806 ) the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.	2025-01-07 18:24:50 -08:00
Florian Hahn	473cdb93e5	[TySan] Don't report globals with incomplete types. (#121922 ) Type metadata for incomplete types should also get handled at the place they are defined. Fixes https://github.com/llvm/llvm-project/issues/121014. PR: https://github.com/llvm/llvm-project/pull/121922	2025-01-07 15:06:26 +00:00
天音あめ	ca5fd06366	[clang] Fix crashes when passing VLA to va_arg (#119563 ) Closes #119360. This bug occurs when passing a VLA to `va_arg`. Since the return value is inferred to be an array, it triggers `ScalarExprEmitter::VisitCastExpr`, which converts it to a pointer and subsequently calls `CodeGenFunction::EmitAggExpr`. At this point, because the inferred type is an `AggExpr` instead of a `ScalarExpr`, `ScalarExprEmitter::VisitVAArgExpr` is not invoked, and as a result, `CodeGenFunction::EmitVariablyModifiedType` is also not called, leading to the size of the VLA not being retrieved. The solution is to move the call to `CodeGenFunction::EmitVariablyModifiedType` into `CodeGenFunction::EmitVAArg`, ensuring that the size of the VLA is correctly obtained regardless of whether the expression is an `AggExpr` or a `ScalarExpr`.	2025-01-07 07:49:43 -05:00
Nicholas Guy	21b531ead1	[clang][llvm][aarch64] Add aarch64_sme_in_streaming_mode intrinsic (#120265 ) Replacing the extant streaming mode function call with an intrinsic allows us to make further optimisations around it. For example, if it's called within a function that has a known streaming mode, we can remove the dead code, and avoid the redundant conditional branch.	2025-01-07 09:02:26 +00:00
Alexandros Lamprineas	93011fe2a5	[FMV][AArch64][clang] Emit fmv-features metadata in LLVM IR. (#118544 ) We need to be able to propagate information about FMV attribute strings from C/C++ source to LLVM IR. This is necessary so that we can distinguish which target-features are coming from the cmdline, which are coming from the target attribute, and which are coming from feature dependency expansion. We need this for static resolution of calls in LLVM. Here's a motivating example: Suppose you have target_version("i8mm+dotprod") and target_version("fcma"). The first version clearly has higher priority. Now suppose you specify -march=armv8-a+i8mm on the command line. Then the versions would have target-features "+i8mm,+dotprod" and "+i8mm,+fcma" respectively. If you are using those to deduce version priority, then you would incorrectly deduce that the second version was higher priority than the first.	2025-01-07 08:51:23 +00:00
David Green	ca603d2536	[AArch64] Regenerate neon-vcmla.c test. NFC This removes -O1 from the opt pipeline, using just mem2reg,instsimplify instead. The target is changed so that the auto update script will apply.	2025-01-06 16:26:41 +00:00
Joseph Huber	81fae0d5e3	[Clang][AMDGPU] Stop defaulting to `one-as` for all atomic scopes (#120095 ) Summary: The documentation at https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these 'one-as' modifiers are more specific versions of the scopes that only apply to a specific address space. This doesn't make sense for fences which have no associated address space to use, and it's a more restrictive version the normal scope. This should not tbe the default behavior, but it is currently emitted in all cases except for sequentially consistent.	2025-01-06 08:11:08 -06:00
Kerry McLaughlin	d8d4c18761	[AArch64][SME] Disable inlining of callees with new ZT0 state (#121338 ) Inlining must be disabled for new-ZT0 callees as the callee is required to save ZT0 and toggle PSTATE.ZA on entry.	2025-01-06 12:02:28 +00:00
Benjamin Maxwell	e4e2f53693	[clang] Add sincos builtin using `llvm.sincos` intrinsic (#114086 ) This registers `sincos[f\|l]` as a clang builtin and updates GCBuiltin to emit the `llvm.sincos.` intrinsic when `-fno-math-errno` is set. Note: `llvm.sincos.` is only emitted by `__builtin_sincos[f\|l]` functions in this initial patch.	2025-01-06 11:07:25 +00:00
Fangrui Song	82fecab85a	[gcov] Bump default version to 11.1 The gcov version is set to 11.1 (compatible with gcov 9) even if `-Xclang -coverage-version=` specified version is less than 11.1. Therefore, we can drop producer support for version < 11.1.	2025-01-02 23:01:28 -08:00
Timm Baeder	45e874e390	[clang][bytecode] Check for memcpy/memmove dummy pointers earlier (#121453 )	2025-01-02 09:15:14 +01:00
Stephen Senran Zhang	2feffecb88	[ConstantRange] Estimate tighter lower (upper) bounds for masked binary and (or) (#120352 ) Fixes #118108. Co-author: Yingwei Zheng (@dtcxzyw)	2024-12-31 18:40:17 -08:00
Momchil Velikov	f70ab7d909	[AArch64] Fix argument passing for SVE tuples (#118961 ) The fix for passing Pure Scalable Types (https://github.com/llvm/llvm-project/pull/112747) was incomplete, it didn't handle correctly tuples of SVE vectors (e.g. `sveboolx2_t`, `svfloat32x4_t`, etc). These types are Pure Scalable Types and should be passed either entirely in vector registers or indirectly in memory, not split.	2024-12-23 09:26:24 +00:00
Phoebe Wang	113177f98b	[X86][AVX10.2] Fix wrong mask bits in cvtpbf8_ph intrinsics (#120927 ) Found during review #120766	2024-12-23 17:13:56 +08:00
Thurston Dang	5bb650345d	Remove -bounds-checking-unique-traps (replace with -fno-sanitize-merge=local-bounds) (#120682 ) #120613 removed -ubsan-unique-traps and replaced it with -fno-sanitize-merge (introduced in #120511), which allows fine-grained control of which UBSan checks to prevent merging. This analogous patch removes -bound-checking-unique-traps, and allows it to be controlled via -fno-sanitize-merge=local-bounds. Most of this patch is simply plumbing through the compiler flags into the bounds checking pass. Note: this patch subtly changes -fsanitize-merge (the default) to also include -fsanitize-merge=local-bounds. This is different from the previous behavior, where -fsanitize-merge (or the old -ubsan-unique-traps) did not affect local-bounds (requiring the separate -bounds-checking-unique-traps). However, we argue that the new behavior is more intuitive. Removing -bounds-checking-unique-traps and merging its functionality into -fsanitize-merge breaks backwards compatibility; we hope that this is acceptable since '-mllvm -bounds-checking-unique-traps' was an experimental flag.	2024-12-20 10:07:44 -08:00
Mikhail Goncharov	93743ee566	Revert "[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449 )" This reverts commit 9fc2fadbfcb34df5f72bdaed28a7874bf584eed7. See https://github.com/llvm/llvm-project/pull/120449#issuecomment-2556089016	2024-12-20 08:14:26 +01:00
Vitaly Buka	c2aee50620	[ubsan] Runtime and driver support for local-bounds (#120515 ) Implements ``-f[no-]sanitize-trap=local-bounds``, and ``-f[no-]sanitize-recover=local-bounds``. LLVM part is here #120513.	2024-12-19 16:38:07 -08:00
Thurston Dang	d33a2c5811	[BoundsSan] Update BoundsChecking.cpp to use no-merge attribute where applicable (#120620 ) https://github.com/llvm/llvm-project/pull/65972 introduced -ubsan-unique-traps and -bounds-checking-unique-traps, which attach the function size to the ubsantrap intrinsic. https://github.com/llvm/llvm-project/pull/117651 changed ubsan-unique-traps to use nomerge instead of the function size, but did not update -bounds-checking-unique-traps. This patch adds nomerge to bounds-checking-unique-traps.	2024-12-19 13:31:29 -08:00
Florian Hahn	9e322c56f7	[TySan] Don't report globals with external storage. (#120565 ) Globals with external storage should have been initialized where they are defined. Fixes https://github.com/llvm/llvm-project/issues/120448 PR: https://github.com/llvm/llvm-project/pull/120565	2024-12-19 21:30:56 +00:00
Thurston Dang	cb8a90b7d1	[ubsan] Remove -ubsan-unique-traps (replace with -fno-sanitize-merge) (#120613 ) -fno-sanitize-merge (introduced in https://github.com/llvm/llvm-project/pull/120511) duplicates the functionality of -ubsan-unique-traps but also allows individual checks to be specified e.g., * "-fno-sanitize-merge" without arguments is equivalent to -ubsan-unique-traps * "-fno-sanitize-merge=bool,enum" will apply it only to those two checks Additionally, the naming is more consistent with the rest of the -fsanitize- family. This patch therefore removes -ubsan-unique-traps. This breaks backwards compatibility; we hope that this is acceptable since '-mllvm -ubsan-unique-traps' was an experimental flag. This patch also adds negative test examples to bounds-checking.c, and strengthens the NOOPTARRAY assertion to prevent spurious matches. "-bounds-checking-unique-traps" is unaffected by this patch.	2024-12-19 12:53:48 -08:00
SpencerAbson	9469fd24b9	[Clang][AArch64] Remove const from base pointers in sve2p1 stores (#120551 ) This patch removes the const qualifier from the base pointer argument of `svst1wq`/`svst1wq_vnum` and `svst1dq`/`svst1dq_vnum`, in accordance with https://github.com/ARM-software/acle/pull/359.	2024-12-19 14:13:02 +00:00
SpencerAbson	db84ae3a68	[Clang][AArch64] Add signed index/offset variants of sve2p1 qword stores (#120549 ) This patch adds signed offset/index variants to the SVE2p1 quadword store intrinsics, in accordance with https://github.com/ARM-software/acle/pull/359.	2024-12-19 13:27:07 +00:00
Alexandros Lamprineas	6586c676b4	[FMV][AArch64] Emit mangled default version if explicitly specified. (#120022 ) Currently we need at least one more version other than the default to trigger FMV. However we would like a header file declaration __attribute__((target_version("default"))) void f(void); to guarantee that there will be f.default	2024-12-19 12:06:46 +00:00
Oliver Stannard	9fc2fadbfc	[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449 ) Re-write the sema and codegen for the atomic_test_and_set and atomic_clear builtin functions to go via AtomicExpr, like the other atomic builtins do. This simplifies the code, because AtomicExpr already handles things like generating code for to dynamically select the memory ordering, which was duplicated for these builtins. This also fixes a few crash bugs, one when passing an integer to the pointer argument, and one when using an array. This also adds diagnostics for the memory orderings which are not valid for atomic_clear according to https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which were missing before. Fixes #111293.	2024-12-19 09:12:19 +00:00
Thurston Dang	ffff7bb582	Reapply "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120…464)" (#120511 ) This reverts commit 2691b964150c77a9e6967423383ad14a7693095e. This reapply fixes the buildbot breakage of the original patch, by updating clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge (the default, which is merge, is applied by the driver but not clang_cc1). This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c. ---- Original commit message: '-mllvm -ubsan-unique-traps' (https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan checks. This patch introduces -fsanitize-merge (defaults to on, maintaining the status quo behavior) and -fno-sanitize-merge (equivalent to '-mllvm -ubsan-unique-traps'), with the option to selectively applying non-merged handlers to a subset of UBSan checks (e.g., -fno-sanitize-merge=bool,enum). N.B. we do not use "trap" in the argument name since https://github.com/llvm/llvm-project/pull/119302 has generalized -ubsan-unique-traps to work for non-trap modes (min-rt and regular rt). This patch does not remove the -ubsan-unique-traps flag; that will override -f(no-)sanitize-merge.	2024-12-18 18:13:26 -08:00
Thurston Dang	2691b96415	Revert "[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464 )" This reverts commit 7eaf4708098c216bf432fc7e0bc79c3771e793a4. Reason: buildbot breakage (e.g., https://lab.llvm.org/buildbot/#/builders/144/builds/14299/steps/6/logs/FAIL__Clang__ubsan-trap-debugloc_c)	2024-12-18 23:50:01 +00:00
Thurston Dang	7eaf470809	[ubsan] Add -fsanitize-merge (and -fno-sanitize-merge) (#120464 ) '-mllvm -ubsan-unique-traps' (https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan checks. This patch introduces -fsanitize-merge (defaults to on, maintaining the status quo behavior) and -fno-sanitize-merge (equivalent to '-mllvm -ubsan-unique-traps'), with the option to selectively applying non-merged handlers to a subset of UBSan checks (e.g., -fno-sanitize-merge=bool,enum). N.B. we do not use "trap" in the argument name since https://github.com/llvm/llvm-project/pull/119302 has generalized -ubsan-unique-traps to work for non-trap modes (min-rt and regular rt). This patch does not remove the -ubsan-unique-traps flag; that will override -f(no-)sanitize-merge.	2024-12-18 15:36:12 -08:00
Alexander Kornienko	23a239267e	Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460 ) Reverts llvm/llvm-project#119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822 https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255	2024-12-18 19:06:34 +01:00
Florian Hahn	c135f6ffe2	[TySan] Add initial Type Sanitizer support to Clang) (#76260 ) This patch introduces the Clang components of type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32198. The Clang changes are mostly formulaic, the one specific change being that when the TBAA sanitizer is enabled, TBAA is always generated, even at -O0. It goes together with the corresponding LLVM changes (https://github.com/llvm/llvm-project/pull/76259) and compiler-rt changes (https://github.com/llvm/llvm-project/pull/76261) PR: https://github.com/llvm/llvm-project/pull/76260	2024-12-17 15:13:42 +00:00
SpencerAbson	908e30658d	[AArch64] Implement intrinsics for FP8 SME FMLAL/FMLALL (multi) (#119546 ) This patch implements the following intrinsics: Multi-vector 8-bit floating-point multiply-add long (multiple vectors). ``` c // Only if __ARM_FEATURE_SME_F8F16 != 0 void svmla_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svmla_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); ``` In accordance with https://github.com/ARM-software/acle/pull/323	2024-12-17 11:47:20 +00:00
SpencerAbson	9c89b40f18	[AArch64] Implement intrinsics for FMLAL/FMLALL (single) (#119568 ) Multi-vector 8-bit floating-point multiply-add long (single) ```c // Only if __ARM_FEATURE_SME_F8F16 != 0 void svmla[_single]_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla[_single]_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla[_single]_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svmla[_single]_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla[_single]_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla[_single]_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); ``` In accordance with https://github.com/ARM-software/acle/pull/323. Co-authored-by: Momchil Velikov momchil.velikov@arm.com	2024-12-17 09:31:54 +00:00
Florian Mayer	514580b438	[MTE] Apply alignment / size in AsmPrinter rather than IR (#111918 ) This makes sure no optimizations are applied that assume the bigger alignment or size, which could be incorrect if we link together with non-instrumented code.	2024-12-17 00:47:02 -08:00
SpencerAbson	38099d0608	[AArch64] Implement intrinsics for SME FP8 FMLAL/FMLALL (Indexed) (#118549 ) This patch implements the following intrinsics: Multi-vector 8-bit floating-point multiply-add long. ``` c // Only if __ARM_FEATURE_SME_F8F16 != 0 void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, uint64_t imm_idx fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm)__arm_streaming __arm_inout("za"); ``` In accordance with: https://github.com/ARM-software/acle/pull/323	2024-12-16 21:45:38 +00:00
Jonathan Thackray	8380bafaed	[AArch64] Add intrinsics for SME FP8 FVDOT, FVDOTB and FVDOTT intrinsics (#119922 ) Add support for the following SME 8 bit floating-point dot-product intrinsics: ``` // Only if __ARM_FEATURE_SME_F8F16 != 0 void svvdot_lane_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svvdott_lane_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); void svvdotb_lane_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, uint64_t imm_idx, fpm_t fpm) __arm_streaming __arm_inout("za"); ``` --------- Co-authored-by: Momchil Velikov <momchil.velikov@arm.com> Co-authored-by: Marian Lukac <marian.lukac@arm.com>	2024-12-16 14:42:45 +00:00
Jonathan Thackray	ef4b597015	[AArch64] Add intrinsics for SME FP8 FDOT single and multi instructions (#119845 ) Add support for the following SME 8 bit floating-point dot-product intrinsics: ``` // Only if __ARM_FEATURE_SME_F8F16 != 0 void svdot[_single]_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svdot[_single]_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svdot_za16[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svdot_za16[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svdot[_single]_za32[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svdot[_single]_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svdot_za32[_mf8]_vg1x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svdot_za32[_mf8]_vg1x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); ``` These intrinsics are extracted from: https://github.com/ARM-software/acle/pull/323/ Co-authored-by: Momchil Velikov <momchil.velikov@arm.com> Co-authored-by: Marian Lukac <marian.lukac@arm.com>	2024-12-16 13:14:40 +00:00
Daniil Kovalev	f65a21a4ec	[PAC][ELF][AArch64] Support signed personality function pointer (#119361 ) Re-apply #113148 after revert in #119331 If function pointer signing is enabled, sign personality function pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key, 0x7EAD = `ptrauth_string_discriminator("personality")` constant discriminator and address diversity enabled.	2024-12-16 10:24:09 +03:00
Momchil Velikov	2eed88da6a	[AArch64] Implement FP8 SVE intrinsics for fused multiply-add (#118126 ) This patch adds the following intrinsics: * 8-bit floating-point multiply-add long to half-precision (bottom). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat16_t svmlalb[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat16_t svmlalb[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point multiply-add long to half-precision (bottom, indexed). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat16_t svmlalb_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_15, fpm_t fpm); * 8-bit floating-point multiply-add long to half-precision (top). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat16_t svmlalt[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat16_t svmlalt[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point multiply-add long to half-precision (top, indexed). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat16_t svmlalt_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_15, fpm_t fpm); * 8-bit floating-point multiply-add long long to single-precision (bottom bottom). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat32_t svmlallbb[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat32_t svmlallbb[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point multiply-add long long to single-precision (bottom bottom, indexed). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat32_t svmlallbb_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_15, fpm_t fpm); * 8-bit floating-point multiply-add long long to single-precision (bottom top). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat32_t svmlallbt[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat32_t svmlallbt[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point multiply-add long long to single-precision (bottom top, indexed). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat32_t svmlallbt_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_15, fpm_t fpm); * 8-bit floating-point multiply-add long long to single-precision (top bottom). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat32_t svmlalltb[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat32_t svmlalltb[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point multiply-add long long to single-precision (top bottom, indexed). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat32_t svmlalltb_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_15, fpm_t fpm); * 8-bit floating-point multiply-add long long to single-precision (top top). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat32_t svmlalltt[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat32_t svmlalltt[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point multiply-add long long to single-precision (top top, indexed). // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8FMA) \|\| __ARM_FEATURE_SSVE_FP8FMA svfloat32_t svmlalltt_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_15, fpm_t fpm);	2024-12-13 21:05:27 +00:00
Momchil Velikov	c2172431c7	[AArch64] Implements FP8 SVE intrinsics for dot-product (#118125 ) This patch adds the following intrinsics: * 8-bit floating-point dot product to single-precision. // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) \|\| __ARM_FEATURE_SSVE_FP8DOT4 svfloat32_t svdot[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat32_t svdot[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point indexed dot product to single-precision. // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) \|\| __ARM_FEATURE_SSVE_FP8DOT4 svfloat32_t svdot_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_3, fpm_t fpm); * 8-bit floating-point dot product to half-precision. // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) \|\| __ARM_FEATURE_SSVE_FP8DOT2 svfloat16_t svdot[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat16_t svdot[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point indexed dot product to half-precision. // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) \|\| __ARM_FEATURE_SSVE_FP8DOT2 svfloat16_t svdot_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_7, fpm_t fpm);	2024-12-13 14:06:54 +00:00
Nikita Popov	a30e50fcb3	[BasicAA] Do not decompose past casts with different index width (#119365 ) BasicAA currently tries to support addrspacecasts that change the index width by performing the decomposition in the maximum of all index widths and then trying to fix this up with in-place sign extends to get correct overflow behavior if the actual index width is smaller. However, even in the case where we don't mix different index widths and just have an index width that is smaller than the maximum, the behavior is incorrect (see test), because we only perform the index width adjustment during decomposition and not any of the later logic -- and we don't do anything at all for variable offsets. I'm sure that the case where we actually mix different index widths is even more broken than that. Fix this by not allowing decomposition through index width changes. If the pointers have different index widths, fall back to a base object comparison, ignoring the offsets.	2024-12-13 12:58:59 +01:00

1 2 3 4 5 ...

9599 Commits