llvm-project

Author	SHA1	Message	Date
Kerry McLaughlin	c34cba0413	[AArch64][SME] Lower aarch64.sme.cnts* to vscale when in streaming mode (#154305 ) In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when in streaming-mode.	2025-08-20 09:48:36 +01:00
Michael Buch	3f3bc4853e	[clang][test][DebugInfo] Move debug-info tests from CodeGen to DebugInfo directory (#154311 ) This patch works towards consolidating all Clang debug-info into the `clang/test/DebugInfo` directory (https://discourse.llvm.org/t/clang-test-location-of-clang-debug-info-tests/87958). Here we move only the `clang/test/CodeGen` tests. The list of files i came up with is: 1. searched for anything with `debug-info` in the filename 2. searched for occurrences of `debug-info-kind` in the tests I created a couple of subdirectories in `clang/test/DebugInfo` where I thought it made sense (mostly when the tests were target-specific). There's a couple of tests in `clang/test/CodeGen` that still set `-debug-info-kind`. They probably don't need to do that, but I'm not changing that as part of this PR.	2025-08-19 18:25:13 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Amina Chabane	62744f3681	[AArch64][NEON] NEON intrinsic compilation error with -fno-lax-vector-conversion flag fix (#149329 ) Issue originally raised in https://github.com/llvm/llvm-project/issues/71362#issuecomment-3028515618. Certain NEON intrinsics that operate on poly types (e.g. poly8x8_t) failed to compile with the -fno-lax-vector-conversions flag. This patch updates NeonEmitter.cpp to insert an explicit __builtin_bit_cast from poly types to the required signed integer vector types when generating lane-related intrinsics. A test 'neon-bitcast-poly.ll' is included.	2025-07-30 10:56:14 +01:00
Martin Wehking	933ba27306	Fix implicit vector conversion (#149970 ) Previously, the unsigned NEON intrinsic variants of 'vqshrun_high_n' and 'vqrshrun_high_n' were using signed integer types for their first argument and return values. These should be unsigned according to developer.arm.com, however. Adjust the test cases accordingly.	2025-07-23 15:44:46 +01:00
Antonio Frighetto	9e0c06d708	[clang][CodeGen] Set `dead_on_return` when passing arguments indirectly Let Clang emit `dead_on_return` attribute on pointer arguments that are passed indirectly, namely, large aggregates that the ABI mandates be passed by value; thus, the parameter is destroyed within the callee. Writes to such arguments are not observable by the caller after the callee returns. This should desirably enable further MemCpyOpt/DSE optimizations. Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.	2025-07-18 11:50:18 +02:00
Paul Walker	b152611cbe	[NFC][Clang] Merge SVE bfloat specific acle tests with non-bfloat tests.	2025-07-11 14:40:12 +00:00
Paul Walker	584ef94762	[Clang][AArch64] Relax SVE bf16 requirement for opaque builtins. (#147795 ) Feature flags protect instructions not datatypes. This means only builtins associated with +bf16 protected instructions must be guarded. Those that treat the data as opaque 16-bit values (e.g. loads, store and shuffles) should be freely available with the underlying SVE feature.	2025-07-11 15:18:21 +01:00
Paul Walker	a4d95c2717	[Clang][AArch64] Add missing builtins for __ARM_FEATURE_SME2p1. (#147362 ) The quadword vector instructions introduced by SVE2p1/SME2p1 but the builtins were not available to streaming mode. RAX1 is available in streaming mode when SME2p1 is available.	2025-07-09 12:40:02 +01:00
Paul Walker	f676014955	[Clang][AArch64] Fix feature guards for SVE2p1 builtins available in SME{2}. (#147086 ) Builtins that are enabled via +sve2p1 in non-streaming mode and +sme{2} in streaming mode should also be enabled via +sve+sme{2} in non-streaming mode and +sme+sve2p1 in streaming mode.	2025-07-09 11:05:51 +01:00
Elvina Yakubova	bd6e9047dd	[LLVM][AArch64] Relax SVE codegen predicates for sm4 instructions (#147524 ) Adds sve-sm4 to reference FEAT_SVE_SM4 without specifically enabling SVE2.	2025-07-08 17:04:21 +01:00
CarolineConcatto	7ee2c72a8e	[AArch64] Mark aarch64_set_fpmr as IntrWriteMem (#146353 ) llvm.aarch64.set.fpmr only writes to inaccessible memory. Tag it with the IntrWriteMem and IntrInaccessibleMemOnly properties so the optimiser can treat it as a pure write. The original patch did not add this property, causing the intrinsic to be conservatively treated as readwrite. This commit fixes that.	2025-07-04 08:52:36 +01:00
Kerry McLaughlin	33c8d5c686	[Clang][AArch64] Add FP8 variants of Neon store intrinsics (#145346 ) Adds FP8 variants for existing VST1, VST2, VST3 & VST4 intrinsics.	2025-06-30 11:30:46 +01:00
amilendra	a72a0f415d	[Clang][AArch64] Add mfloat8_t variants of Neon load intrinsics (#145666 ) Add mfloat8_t support for the following Neon load intrinsics. - VLD1 - VLD1_X2 - VLD1_X3 - VLD1_X4 - VLD1_LANE - VLD1_DUP - VLD2 - VLD3 - VLD4 - VLD2_DUP - VLD3_DUP - VLD4_DUP - VLD2_LANE - VLD3_LANE - VLD4_LANE	2025-06-30 11:19:14 +01:00
amilendra	5e732c09b2	[CLANG][AArch64] Add mfloat8_t support for more SVE load intrinsics (#145383 ) Add mfloat8_t support for the following SVE load intrinsics. - SVLD1RO - SVLD1RQ - SVLDFF1 - SVLDFF1_VNUM - SVLDNF1 - SVLDNF1_VNUM	2025-06-30 11:18:50 +01:00
Paul Walker	635acfbfca	[LLVM][AArch64] Relax SVE/SME codegen predicates for crypto and bitperm instructions. (#145696 ) Adds sve-sha3 to reference FEAT_SVE_SHA3 without specifically enabling SVE2. The SVE2 requirement for AES, SHA3 and Bitperm is replaced with SVE for non-streaming function.	2025-06-26 13:01:07 +01:00
David Green	030a471753	[AArch64][Clang] Exclude address spaces from pointer-only coercion types. As reported on #135064, the generic pointer coercion code in CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast the pointers). This bails out if an address space qualifier is found on the pointer.	2025-06-12 20:51:58 +01:00
David Green	ddb771ecfd	[AArch64][Clang] Update new Neon vector element types. (#142760 ) This updates the element types used in the new __Int8x8_t types added in #126945, mostly to allow C++ name mangling in ItaniumMangling mangleAArch64VectorBase to work correctly. Char is replaced by SignedCharTy or UnsignedCharTy as required and Float16Ty is better using HalfTy to match the vector types. Same for Long types.	2025-06-11 09:50:26 +01:00
David Green	5f648c370e	[AArch64] Change the coercion type of structs with pointer members. (#135064 ) The aim here is to avoid a ptrtoint->inttoptr round-trip through the function argument whilst keeping the calling convention the same. Given a struct which is <= 128bits in size, which can only contain either 1 or 2 pointers, we convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or [2 x i64]. This helps alias analysis produce more accurate results.	2025-06-10 07:04:54 +01:00
David Green	be6fc0092e	[AArch64] Add REQUIRES: aarch64-registered-target to mixed-neon-types.c Update the new test added in #126945	2025-06-02 17:29:46 +01:00
Tomas Matheson	832a7bb460	[AArch64] Add missing Neon Types (#126945 ) The AAPCS64 adds a number of vector types to the C unconditionally: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#11appendix-support-for-advanced-simd-extensions The equivalent SVE types are already available in clang: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#12appendix-support-for-scalable-vectors __mfp8 is defined in the ACLE https://arm-software.github.io/acle/main/acle.html#data-types --------- Co-authored-by: David Green <david.green@arm.com>	2025-06-02 17:09:35 +01:00
Alexandros Lamprineas	b3fd2ea888	[Clang][FMV] Stop emitting implicit default version using target_clones. (#141808 ) With the current behavior the following example yields a linker error: "multiple definition of `foo.default'" // Translation Unit 1 __attribute__((target_clones("dotprod, sve"))) int foo(void) { return 1; } // Translation Unit 2 int foo(void) { return 0; } __attribute__((target_version("dotprod"))) int foo(void); __attribute__((target_version("sve"))) int foo(void); int bar(void) { return foo(); } That is because foo.default is generated twice. As a user I don't find this particularly intuitive. If I wanted the default to be generated in TU1 I'd rather write target_clones("dotprod, sve", "default") explicitly. When changing the code I noticed that the RISC-V target defers the resolver emission when encountering a target_version definition. This seems accidental since it only makes sense for AArch64, where we only emit a resolver once we've processed the entire TU, and only if the default version is present. I've changed this so that RISC-V immediately emmits the resolver. I adjusted the codegen tests since the functions now appear in a different order. Implements https://github.com/ARM-software/acle/pull/377	2025-06-02 11:04:00 +01:00
Matthew Devereau	22576e2cce	[Clang][AArch64] Add pessimistic vscale_range for sve/sme (#137624 ) The "target-features" function attribute is not currently considered when adding vscale_range to a function. When +sve/+sme are pushed onto functions with "#pragma attribute push(+sve/+sme)", the function potentially misses out on optimizations that rely on vscale_range being present.	2025-05-16 09:39:07 +01:00
Lukacma	6fc0312919	[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (#128019 ) This patch adds fp8 variants to existing intrinsics, whose operation doesn't depend on arguments being a specific type. It also changes mfloat8 type representation in memory from `i8` to `<1xi8>`	2025-05-15 14:01:41 +01:00
David Green	f8f11c541d	[AArch64] Add a test case for the coerced arguments. NFC	2025-05-15 11:51:58 +01:00
Ricardo Jesus	af03d6b518	[AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (#139236 ) Similarly to #135016, refactor getPTrue to return splat (1) for all-active patterns. The main motivation for this is to improve code gen for fixed-length vector loads/stores that are converted to SVE masked memory ops when the vectors are wider than Neon. Emitting the mask as a splat helps DAGCombiner simplify all-active masked loads/stores into unmaked ones, for which it already has suitable combines and ISel has suitable patterns.	2025-05-12 10:35:30 +01:00
Lukacma	e4d9be3d12	[Clang][AArch64] make bitperm intrinsics available in streaming mode (#129700 ) Based on recent changes in armv9.6 BitPerm instructions and thus intrinsics are available in streaming mode when [FEAT_SSVE_BitPerm](https://developer.arm.com/documentation/109697/2024_12/Feature-descriptions/The-Armv9-6-architecture-extension?lang=en#md463-the-armv96-architecture-extension__feat_FEAT_SSVE_BitPerm) is available. This patch reflects this change and is based on [ACLE proposal](https://github.com/ARM-software/acle/pull/385).	2025-05-02 11:52:09 +01:00
jyli0116	064f9d03f2	[AArch64] Add FEAT_FPAC to supported CPUs (#137330 ) Added FEAT_FPAC onto supported AArch64 CPUs which don't have it under the processor description.	2025-04-28 14:49:06 +01:00
SivanShani-Arm	be044976b6	[AArch64] Update __gcsss intrinsic to match revised ACLE specification (#136850 ) The original __gcsss intrinsic was implemented based on: https://github.com/ARM-software/acle/pull/260 with the signature: const void __gcsss(const void ) Per the updated specification in: https://github.com/ARM-software/acle/pull/364 both const qualifiers have been removed. This commit updates the signature accordingly to: void __gcsss(void ) This aligns the implementation with the latest ACLE definition.	2025-04-24 09:43:23 +01:00
Virginia Cangelosi	5b0cd17c38	[Clang][llvm] Implement fp8 FMOP4A intrinsics (#130127 ) Implement all mf8 FMOP4A instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files. It also updates previous mop4 instructions from IntrNoMem to IntrInaccessibleMemOnly	2025-04-23 14:10:13 +01:00
Vitaly Buka	c3000333cd	Revert "[Reland][Clang][CodeGen][UBSan] Add more precise attributes to recoverable ubsan handlers" (#136402 ) Reverts llvm/llvm-project#135135 Breaks several bots, details in #135135.	2025-04-18 22:14:03 -07:00
Harald van Dijk	ca9ec7dfc3	[ARM, AArch64] Fix passing of structures with aligned base classes (#135564 ) RecordLayout::UnadjustedAlignment was documented as "Maximum of the alignments of the record members in characters", but RecordLayout::getUnadjustedAlignment(), which just returns UnadjustedAlignment, was documented as getting "the record alignment in characters, before alignment adjustement." These are not the same thing: the former excludes alignment of base classes, the latter takes it into account. ItaniumRecordLayoutBuilder::LayoutBase was setting it according to the former, but the AAPCS calling convention handling, currently the only user, relies on it being set according to the latter. Fixes #135551.	2025-04-18 02:11:02 +01:00
Yingwei Zheng	909a9feda9	[Reland][Clang][CodeGen][UBSan] Add more precise attributes to recoverable ubsan handlers (#135135 ) This patch relands https://github.com/llvm/llvm-project/pull/130990. If the check value is passed by reference, add `memory(read)`. Original PR description: This patch adds `memory(argmem: read, inaccessiblemem: readwrite)` to recoverable ubsan handlers in order to unblock some memory/loop optimizations. It provides an average of 3% performance improvement on llvm-test-suite (except for 49 test failures due to ubsan diagnostics).	2025-04-17 23:23:30 +08:00
Lukacma	de90487fc1	[AARCH64] Add FEAT_SSVE_FEXPA and fix unsupported features list (#134368 ) This patch adds new feature introduced in [2025-03 release](https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/FEXPA--Floating-point-exponential-accelerator-) and changes feature requirements for fexpa instructions and intrinsics. Additionally it fixes unsupported features list by moving fearures dependent on sme2p1 to correct location.	2025-04-16 15:20:05 +01:00
Jonathan Thackray	3d7e56fd28	[AArch64][clang][llvm] Add structured sparsity outer product (TMOP) intrinsics (#135145 ) Implement all {BF/F/S/U/SU/US}TMOP intrinsics in clang and llvm following the ACLE in https://github.com/ARM-software/acle/pull/380/files	2025-04-16 10:59:07 +01:00
Matthew Devereau	91a205653e	[AArch64][SVE] Instcombine ptrue(all) to splat(i1) (#135016 ) SVE Operations such as predicated loads become canonicalized to LLVM masked loads, and doing the same for ptrue(all) to splat(1) creates further optimization opportunities from generic LLVM IR passes.	2025-04-13 20:40:51 +01:00
Jonathan Thackray	204d8c0d58	[clang][llvm] Fix AArch64 MOP4{A/S} intrinsic tests (NFC) (#134746 ) Fix some of the recently-added tests (PRs #127797, #128854, #129226 and #129230) which were incorrectly defined.	2025-04-08 11:45:47 +01:00
Virginia Cangelosi	79487757b7	[Clang][LLVM] Implement multi-multi vectors MOP4{A/S} (#129230 ) Implement all multi-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 19:20:27 +01:00
Jonathan Thackray	558ce50ebc	[Clang][LLVM] Implement multi-single vectors MOP4{A/S} (#129226 ) Implement all multi-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the ACLE in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 17:04:59 +01:00
Virginia Cangelosi	e92ff64bad	[Clang][LLVM] Implement single-multi vectors MOP4{A/S} (#128854 ) Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files. This PR depends on https://github.com/llvm/llvm-project/pull/127797 This patch updates the semantics of template arguments in intrinsic names for clarity and ease of use. Previously, template argument numbers indicated which character in the prototype string determined the final type suffix, which was confusing—especially for intrinsics using multiple prototype modifiers per operand (e.g., intrinsics operating on arrays of vectors). The number had to reference the correct character in the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and error-prone. With this patch, template argument numbers now refer to the operand number that determines the final type suffix, providing a more intuitive and consistent approach.	2025-04-01 15:05:30 +01:00
Virginia Cangelosi	6892d54286	[Clang][LLVM] Implement single-single vectors MOP4{A/S} (#127797 ) Implement all single-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 13:35:09 +01:00
Lukacma	6c3adaafe3	[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate (#127043 ) Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts. Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>	2025-04-01 09:45:16 +01:00
Alexandros Lamprineas	cd3798d7ef	[FMV][AArch64] Add feature CSSC and detect on linux platform. (#132727 ) Also removes priority bits for unused features predres and ls64. Added to ACLE with https://github.com/ARM-software/acle/pull/390	2025-03-26 08:40:29 +00:00
Alexandros Lamprineas	bf2d30e092	[NFC][FMV][AArch64] Tidy up codegen tests. (#132273 ) Removes attr-target-version.c which doesn't have a clear purpose. Introduces AArch64/fmv-detection.c to check detection bitmasks. Adds coverage in AArch64/fmv-resolver-emission.c	2025-03-24 11:39:51 +00:00
Ricardo Jesus	74f5a028cb	Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#129732 )" (#130625 ) The original patch from #129732 exposed a bug in `getMemVTFromNode`, which was returning incorrect types for fixed length vectors.	2025-03-19 08:25:37 +00:00
Ricardo Jesus	21610e3ecc	Revert "[AArch64][SVE] Improve fixed-length addressing modes." (#130263 ) This reverts commit f01e760c08365426de95f02dc2c2dc670eb47352.	2025-03-07 09:35:55 +00:00
Ricardo Jesus	f01e760c08	[AArch64][SVE] Improve fixed-length addressing modes. (#129732 ) When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch in the allowed addressing modes due to SVE loads/stores generally expecting immediate offsets relative to VL. For example, given: ```c svfloat64_t foo(const double *x) { svbool_t pg = svptrue_b64(); return svld1_f64(pg, x+svcntd()); } ``` When compiled with `-msve-vector-bits=128`, we currently generate: ```gas foo: ptrue p0.d mov x8, #2 ld1d { z0.d }, p0/z, [x0, x8, lsl #3] ret ``` Instead, we could be generating: ```gas foo: ldr z0, [x0, #1, mul vl] ret ``` Likewise for other types, stores, and other VLS lengths. This patch achieves the above by extending `SelectAddrModeIndexedSVE` to let constants through when `vscale` is known.	2025-03-06 09:27:07 +00:00
Virginia Cangelosi	4a477eeefa	Fix fp8-init-list.c test failure (#129259 ) Fix error in fp8-init-list.c introduced by PR #126726	2025-02-28 15:17:33 +00:00
Virginia Cangelosi	2477f82db9	[clang] Update SVE load and store intrinsics to have FP8 variants (#126726 )	2025-02-28 14:20:59 +00:00
Virginia Cangelosi	7b263faf16	[CLANG]Update svget, svset, svcreate, svundef to have FP8 variants (#126754 ) This adds FP8 variants to svget, svset, svcreate and svundef under arm_sve.td	2025-02-27 11:36:08 +00:00

1 2 3

117 Commits