117 Commits

Author SHA1 Message Date
Kerry McLaughlin
c34cba0413
[AArch64][SME] Lower aarch64.sme.cnts* to vscale when in streaming mode (#154305)
In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt
intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic
to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when
in streaming-mode.
2025-08-20 09:48:36 +01:00
Michael Buch
3f3bc4853e
[clang][test][DebugInfo] Move debug-info tests from CodeGen to DebugInfo directory (#154311)
This patch works towards consolidating all Clang debug-info into the
`clang/test/DebugInfo` directory
(https://discourse.llvm.org/t/clang-test-location-of-clang-debug-info-tests/87958).

Here we move only the `clang/test/CodeGen` tests.

The list of files i came up with is:
1. searched for anything with `*debug-info*` in the filename
2. searched for occurrences of `debug-info-kind` in the tests

I created a couple of subdirectories in `clang/test/DebugInfo` where I
thought it made sense (mostly when the tests were target-specific).

There's a couple of tests in `clang/test/CodeGen` that still set
`-debug-info-kind`. They probably don't need to do that, but I'm not
changing that as part of this PR.
2025-08-19 18:25:13 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Amina Chabane
62744f3681
[AArch64][NEON] NEON intrinsic compilation error with -fno-lax-vector-conversion flag fix (#149329)
Issue originally raised in
https://github.com/llvm/llvm-project/issues/71362#issuecomment-3028515618.
Certain NEON intrinsics that operate on poly types (e.g. poly8x8_t)
failed to compile with the -fno-lax-vector-conversions flag. This patch
updates NeonEmitter.cpp to insert an explicit __builtin_bit_cast from
poly types to the required signed integer vector types when generating
lane-related intrinsics. A test 'neon-bitcast-poly.ll' is included.
2025-07-30 10:56:14 +01:00
Martin Wehking
933ba27306
Fix implicit vector conversion (#149970)
Previously, the unsigned NEON intrinsic variants of 'vqshrun_high_n' and
'vqrshrun_high_n' were using signed integer types for their first
argument and return values.
These should be unsigned according to developer.arm.com, however.

Adjust the test cases accordingly.
2025-07-23 15:44:46 +01:00
Antonio Frighetto
9e0c06d708 [clang][CodeGen] Set dead_on_return when passing arguments indirectly
Let Clang emit `dead_on_return` attribute on pointer arguments
that are passed indirectly, namely, large aggregates that the
ABI mandates be passed by value; thus, the parameter is destroyed
within the callee. Writes to such arguments are not observable by
the caller after the callee returns.

This should desirably enable further MemCpyOpt/DSE optimizations.

Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
2025-07-18 11:50:18 +02:00
Paul Walker
b152611cbe [NFC][Clang] Merge SVE bfloat specific acle tests with non-bfloat tests. 2025-07-11 14:40:12 +00:00
Paul Walker
584ef94762
[Clang][AArch64] Relax SVE bf16 requirement for opaque builtins. (#147795)
Feature flags protect instructions not datatypes. This means only
builtins associated with +bf16 protected instructions must be guarded.
Those that treat the data as opaque 16-bit values (e.g. loads, store and
shuffles) should be freely available with the underlying SVE feature.
2025-07-11 15:18:21 +01:00
Paul Walker
a4d95c2717
[Clang][AArch64] Add missing builtins for __ARM_FEATURE_SME2p1. (#147362)
The quadword vector instructions introduced by SVE2p1/SME2p1 but the
builtins were not available to streaming mode.
    
RAX1 is available in streaming mode when SME2p1 is available.
2025-07-09 12:40:02 +01:00
Paul Walker
f676014955
[Clang][AArch64] Fix feature guards for SVE2p1 builtins available in SME{2}. (#147086)
Builtins that are enabled via +sve2p1 in non-streaming mode and +sme{2}
in streaming mode should also be enabled via +sve+sme{2} in
non-streaming mode and +sme+sve2p1 in streaming mode.
2025-07-09 11:05:51 +01:00
Elvina Yakubova
bd6e9047dd
[LLVM][AArch64] Relax SVE codegen predicates for sm4 instructions (#147524)
Adds sve-sm4 to reference FEAT_SVE_SM4 without specifically enabling
SVE2.
2025-07-08 17:04:21 +01:00
CarolineConcatto
7ee2c72a8e
[AArch64] Mark aarch64_set_fpmr as IntrWriteMem (#146353)
llvm.aarch64.set.fpmr only writes to inaccessible memory. Tag it with
the IntrWriteMem and IntrInaccessibleMemOnly properties so the optimiser
can treat it as a pure write.

The original patch did not add this property, causing the intrinsic to
be conservatively treated as readwrite. This commit fixes that.
2025-07-04 08:52:36 +01:00
Kerry McLaughlin
33c8d5c686
[Clang][AArch64] Add FP8 variants of Neon store intrinsics (#145346)
Adds FP8 variants for existing VST1, VST2, VST3 & VST4 intrinsics.
2025-06-30 11:30:46 +01:00
amilendra
a72a0f415d
[Clang][AArch64] Add mfloat8_t variants of Neon load intrinsics (#145666)
Add mfloat8_t support for the following Neon load intrinsics.

- VLD1
- VLD1_X2
- VLD1_X3
- VLD1_X4
- VLD1_LANE
- VLD1_DUP
- VLD2
- VLD3
- VLD4
- VLD2_DUP
- VLD3_DUP
- VLD4_DUP
- VLD2_LANE
- VLD3_LANE
- VLD4_LANE
2025-06-30 11:19:14 +01:00
amilendra
5e732c09b2
[CLANG][AArch64] Add mfloat8_t support for more SVE load intrinsics (#145383)
Add mfloat8_t support for the following SVE load intrinsics.

- SVLD1RO
- SVLD1RQ
- SVLDFF1
- SVLDFF1_VNUM
- SVLDNF1
- SVLDNF1_VNUM
2025-06-30 11:18:50 +01:00
Paul Walker
635acfbfca
[LLVM][AArch64] Relax SVE/SME codegen predicates for crypto and bitperm instructions. (#145696)
Adds sve-sha3 to reference FEAT_SVE_SHA3 without specifically enabling
SVE2. The SVE2 requirement for AES, SHA3 and Bitperm is replaced with
SVE for non-streaming function.
2025-06-26 13:01:07 +01:00
David Green
030a471753 [AArch64][Clang] Exclude address spaces from pointer-only coercion types.
As reported on #135064, the generic pointer coercion code in
CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast
the pointers). This bails out if an address space qualifier is found on the
pointer.
2025-06-12 20:51:58 +01:00
David Green
ddb771ecfd
[AArch64][Clang] Update new Neon vector element types. (#142760)
This updates the element types used in the new __Int8x8_t types added in
#126945, mostly to allow C++ name mangling in ItaniumMangling
mangleAArch64VectorBase to work correctly. Char is replaced by
SignedCharTy or UnsignedCharTy as required and Float16Ty is better using
HalfTy to match the vector types. Same for Long types.
2025-06-11 09:50:26 +01:00
David Green
5f648c370e
[AArch64] Change the coercion type of structs with pointer members. (#135064)
The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.
2025-06-10 07:04:54 +01:00
David Green
be6fc0092e [AArch64] Add REQUIRES: aarch64-registered-target to mixed-neon-types.c
Update the new test added in #126945
2025-06-02 17:29:46 +01:00
Tomas Matheson
832a7bb460
[AArch64] Add missing Neon Types (#126945)
The AAPCS64 adds a number of vector types to the C unconditionally:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#11appendix-support-for-advanced-simd-extensions

The equivalent SVE types are already available in clang:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#12appendix-support-for-scalable-vectors

__mfp8 is defined in the ACLE
https://arm-software.github.io/acle/main/acle.html#data-types

---------

Co-authored-by: David Green <david.green@arm.com>
2025-06-02 17:09:35 +01:00
Alexandros Lamprineas
b3fd2ea888
[Clang][FMV] Stop emitting implicit default version using target_clones. (#141808)
With the current behavior the following example yields a linker error:
"multiple definition of `foo.default'"

// Translation Unit 1
__attribute__((target_clones("dotprod, sve"))) int foo(void) { return 1; }

// Translation Unit 2
int foo(void) { return 0; }
__attribute__((target_version("dotprod"))) int foo(void);
__attribute__((target_version("sve"))) int foo(void);
int bar(void) { return foo(); }

That is because foo.default is generated twice. As a user I don't find
this particularly intuitive. If I wanted the default to be generated in
TU1 I'd rather write target_clones("dotprod, sve", "default")
explicitly.

When changing the code I noticed that the RISC-V target defers the
resolver emission when encountering a target_version definition. This
seems accidental since it only makes sense for AArch64, where we only
emit a resolver once we've processed the entire TU, and only if the
default version is present. I've changed this so that RISC-V immediately
emmits the resolver. I adjusted the codegen tests since the functions
now appear in a different order.

Implements https://github.com/ARM-software/acle/pull/377
2025-06-02 11:04:00 +01:00
Matthew Devereau
22576e2cce
[Clang][AArch64] Add pessimistic vscale_range for sve/sme (#137624)
The "target-features" function attribute is not currently considered
when adding vscale_range to a function. When +sve/+sme are pushed onto
functions with "#pragma attribute push(+sve/+sme)", the function
potentially misses out on optimizations that rely on vscale_range being
present.
2025-05-16 09:39:07 +01:00
Lukacma
6fc0312919
[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (#128019)
This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.

It also changes mfloat8 type representation in memory from `i8` to
`<1xi8>`
2025-05-15 14:01:41 +01:00
David Green
f8f11c541d [AArch64] Add a test case for the coerced arguments. NFC 2025-05-15 11:51:58 +01:00
Ricardo Jesus
af03d6b518
[AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (#139236)
Similarly to #135016, refactor getPTrue to return splat (1) for
all-active patterns. The main motivation for this is to improve
code gen for fixed-length vector loads/stores that are converted to SVE
masked memory ops when the vectors are wider than Neon. Emitting the
mask as a splat helps DAGCombiner simplify all-active masked
loads/stores into unmaked ones, for which it already has suitable
combines and ISel has suitable patterns.
2025-05-12 10:35:30 +01:00
Lukacma
e4d9be3d12
[Clang][AArch64] make bitperm intrinsics available in streaming mode (#129700)
Based on recent changes in armv9.6 BitPerm instructions and thus
intrinsics are available in streaming mode when
[FEAT_SSVE_BitPerm](https://developer.arm.com/documentation/109697/2024_12/Feature-descriptions/The-Armv9-6-architecture-extension?lang=en#md463-the-armv96-architecture-extension__feat_FEAT_SSVE_BitPerm)
is available. This patch reflects this change and is based on [ACLE
proposal](https://github.com/ARM-software/acle/pull/385).
2025-05-02 11:52:09 +01:00
jyli0116
064f9d03f2
[AArch64] Add FEAT_FPAC to supported CPUs (#137330)
Added FEAT_FPAC onto supported AArch64 CPUs which don't have it under
the processor description.
2025-04-28 14:49:06 +01:00
SivanShani-Arm
be044976b6
[AArch64] Update __gcsss intrinsic to match revised ACLE specification (#136850)
The original __gcsss intrinsic was implemented based on:
https://github.com/ARM-software/acle/pull/260
with the signature: const void *__gcsss(const void *)

Per the updated specification in:
https://github.com/ARM-software/acle/pull/364
both const qualifiers have been removed. This commit updates the
signature accordingly to: void *__gcsss(void *)

This aligns the implementation with the latest ACLE definition.
2025-04-24 09:43:23 +01:00
Virginia Cangelosi
5b0cd17c38
[Clang][llvm] Implement fp8 FMOP4A intrinsics (#130127)
Implement all mf8 FMOP4A instructions in clang and llvm following the
acle in https://github.com/ARM-software/acle/pull/381/files.

It also updates previous mop4 instructions from IntrNoMem to
IntrInaccessibleMemOnly
2025-04-23 14:10:13 +01:00
Vitaly Buka
c3000333cd
Revert "[Reland][Clang][CodeGen][UBSan] Add more precise attributes to recoverable ubsan handlers" (#136402)
Reverts llvm/llvm-project#135135

Breaks several bots, details in #135135.
2025-04-18 22:14:03 -07:00
Harald van Dijk
ca9ec7dfc3
[ARM, AArch64] Fix passing of structures with aligned base classes (#135564)
RecordLayout::UnadjustedAlignment was documented as "Maximum of the
alignments of the record members in characters", but
RecordLayout::getUnadjustedAlignment(), which just returns
UnadjustedAlignment, was documented as getting "the record alignment in
characters, before alignment adjustement." These are not the same thing:
the former excludes alignment of base classes, the latter takes it into
account. ItaniumRecordLayoutBuilder::LayoutBase was setting it according
to the former, but the AAPCS calling convention handling, currently the
only user, relies on it being set according to the latter.

Fixes #135551.
2025-04-18 02:11:02 +01:00
Yingwei Zheng
909a9feda9
[Reland][Clang][CodeGen][UBSan] Add more precise attributes to recoverable ubsan handlers (#135135)
This patch relands https://github.com/llvm/llvm-project/pull/130990.
If the check value is passed by reference, add `memory(read)`.

Original PR description:

This patch adds `memory(argmem: read, inaccessiblemem: readwrite)` to
**recoverable** ubsan handlers in order to unblock some
memory/loop optimizations. It provides an average of 3% performance
improvement on llvm-test-suite (except for 49 test failures due to ubsan
diagnostics).
2025-04-17 23:23:30 +08:00
Lukacma
de90487fc1
[AARCH64] Add FEAT_SSVE_FEXPA and fix unsupported features list (#134368)
This patch adds new feature introduced in [2025-03
release](https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/FEXPA--Floating-point-exponential-accelerator-)
and changes feature requirements for fexpa instructions and intrinsics.

Additionally it fixes unsupported features list by moving fearures
dependent on sme2p1 to correct location.
2025-04-16 15:20:05 +01:00
Jonathan Thackray
3d7e56fd28
[AArch64][clang][llvm] Add structured sparsity outer product (TMOP) intrinsics (#135145)
Implement all {BF/F/S/U/SU/US}TMOP intrinsics in clang and llvm
following the ACLE in https://github.com/ARM-software/acle/pull/380/files
2025-04-16 10:59:07 +01:00
Matthew Devereau
91a205653e
[AArch64][SVE] Instcombine ptrue(all) to splat(i1) (#135016)
SVE Operations such as predicated loads become canonicalized to LLVM
masked loads, and doing the same for ptrue(all) to splat(1) creates
further optimization opportunities from generic LLVM IR passes.
2025-04-13 20:40:51 +01:00
Jonathan Thackray
204d8c0d58
[clang][llvm] Fix AArch64 MOP4{A/S} intrinsic tests (NFC) (#134746)
Fix some of the recently-added tests (PRs #127797, #128854, #129226 and
#129230) which were incorrectly defined.
2025-04-08 11:45:47 +01:00
Virginia Cangelosi
79487757b7
[Clang][LLVM] Implement multi-multi vectors MOP4{A/S} (#129230)
Implement all multi-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files
2025-04-01 19:20:27 +01:00
Jonathan Thackray
558ce50ebc
[Clang][LLVM] Implement multi-single vectors MOP4{A/S} (#129226)
Implement all multi-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and
llvm following the ACLE in https://github.com/ARM-software/acle/pull/381/files
2025-04-01 17:04:59 +01:00
Virginia Cangelosi
e92ff64bad
[Clang][LLVM] Implement single-multi vectors MOP4{A/S} (#128854)
Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files.

This PR depends on https://github.com/llvm/llvm-project/pull/127797

This patch updates the semantics of template arguments in intrinsic
names for clarity and ease of use. Previously, template argument numbers
indicated which character in the prototype string determined the final
type suffix, which was confusing—especially for intrinsics using
multiple prototype modifiers per operand (e.g., intrinsics operating on
arrays of vectors). The number had to reference the correct character in
the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and
error-prone.
With this patch, template argument numbers now refer to the operand
number that determines the final type suffix, providing a more intuitive
and consistent approach.
2025-04-01 15:05:30 +01:00
Virginia Cangelosi
6892d54286
[Clang][LLVM] Implement single-single vectors MOP4{A/S} (#127797)
Implement all single-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files
2025-04-01 13:35:09 +01:00
Lukacma
6c3adaafe3
[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate (#127043)
Currently arm_neon.h emits C-style casts to do vector type casts. This
relies on implicit conversion between vector types to be enabled, which
is currently deprecated behaviour and soon will disappear. To ensure
NEON code will keep working afterwards, this patch changes all this
vector type casts into bitcasts.


Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
2025-04-01 09:45:16 +01:00
Alexandros Lamprineas
cd3798d7ef
[FMV][AArch64] Add feature CSSC and detect on linux platform. (#132727)
Also removes priority bits for unused features predres and ls64.

Added to ACLE with https://github.com/ARM-software/acle/pull/390
2025-03-26 08:40:29 +00:00
Alexandros Lamprineas
bf2d30e092
[NFC][FMV][AArch64] Tidy up codegen tests. (#132273)
Removes attr-target-version.c which doesn't have a clear purpose.
Introduces AArch64/fmv-detection.c to check detection bitmasks.
Adds coverage in AArch64/fmv-resolver-emission.c
2025-03-24 11:39:51 +00:00
Ricardo Jesus
74f5a028cb
Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#129732)" (#130625)
The original patch from #129732 exposed a bug in `getMemVTFromNode`, which was returning incorrect types for fixed length vectors.
2025-03-19 08:25:37 +00:00
Ricardo Jesus
21610e3ecc
Revert "[AArch64][SVE] Improve fixed-length addressing modes." (#130263)
This reverts commit f01e760c08365426de95f02dc2c2dc670eb47352.
2025-03-07 09:35:55 +00:00
Ricardo Jesus
f01e760c08
[AArch64][SVE] Improve fixed-length addressing modes. (#129732)
When compiling VLS SVE, the compiler often replaces VL-based offsets
with immediate-based ones. This leads to a mismatch in the allowed
addressing modes due to SVE loads/stores generally expecting immediate
offsets relative to VL. For example, given:
```c

svfloat64_t foo(const double *x) {
  svbool_t pg = svptrue_b64();
  return svld1_f64(pg, x+svcntd());
}
```

When compiled with `-msve-vector-bits=128`, we currently generate:
```gas
foo:
        ptrue   p0.d
        mov     x8, #2
        ld1d    { z0.d }, p0/z, [x0, x8, lsl #3]
        ret
```

Instead, we could be generating:
```gas
foo:
        ldr     z0, [x0, #1, mul vl]
        ret
```

Likewise for other types, stores, and other VLS lengths.

This patch achieves the above by extending `SelectAddrModeIndexedSVE`
to let constants through when `vscale` is known.
2025-03-06 09:27:07 +00:00
Virginia Cangelosi
4a477eeefa
Fix fp8-init-list.c test failure (#129259)
Fix error in fp8-init-list.c introduced by PR #126726
2025-02-28 15:17:33 +00:00
Virginia Cangelosi
2477f82db9
[clang] Update SVE load and store intrinsics to have FP8 variants (#126726) 2025-02-28 14:20:59 +00:00
Virginia Cangelosi
7b263faf16
[CLANG]Update svget, svset, svcreate, svundef to have FP8 variants (#126754)
This adds FP8 variants to svget, svset, svcreate and svundef under
arm_sve.td
2025-02-27 11:36:08 +00:00