209 Commits

Author SHA1 Message Date
Amina Chabane
62744f3681
[AArch64][NEON] NEON intrinsic compilation error with -fno-lax-vector-conversion flag fix (#149329)
Issue originally raised in
https://github.com/llvm/llvm-project/issues/71362#issuecomment-3028515618.
Certain NEON intrinsics that operate on poly types (e.g. poly8x8_t)
failed to compile with the -fno-lax-vector-conversions flag. This patch
updates NeonEmitter.cpp to insert an explicit __builtin_bit_cast from
poly types to the required signed integer vector types when generating
lane-related intrinsics. A test 'neon-bitcast-poly.ll' is included.
2025-07-30 10:56:14 +01:00
Jon Roelofs
a0fcb50bf9
[ARM] Improve arm_neon.h header diagnostic when included on unsupported targets (#147817)
The footgun here was that the preprocessor diagnostic that looks for
__ARM_FP would fire when included on targets like x86_64, but the
suggestion it gives in that case is totally bogus. Avoid giving bad
advice, by first checking whether we're being built for an appropriate
target, and only then do the soft-fp check.

rdar://155449666
2025-07-11 10:21:13 -07:00
Rahul Joshi
3932360b14
[LLVM][TableGen] Rename ListInit::getValues() to getElements() (#140289)
Rename `ListInit::getValues()` to `getElements()` to better match with
other `ListInit` members like `getElement`. Keep `getValues()` for
existing downstream code but mark it deprecated.
2025-05-19 12:16:33 -07:00
Lukacma
6fc0312919
[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (#128019)
This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.

It also changes mfloat8 type representation in memory from `i8` to
`<1xi8>`
2025-05-15 14:01:41 +01:00
Kazu Hirata
8e2a9fa9a5
[TableGen] Use std::tie to implement operator< (NFC) (#139405) 2025-05-10 16:04:26 -07:00
Jay Foad
2bc6f9d4b6
[TableGen] Only store direct superclasses in Record (#123072)
In Record only store the direct superclasses instead of all
superclasses. getSuperClasses recurses to find all superclasses when
necessary.

This gives a small reduction in memory usage. On lib/Target/X86/X86.td I
measured about 2.0% reduction in total bytes allocated (measured by
valgrind) and 1.3% reduction in peak memory usage (measured by
/usr/bin/time -v).

---------

Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
2025-04-24 18:57:51 +01:00
Kazu Hirata
f2ec5e40d9
[clang] Use llvm::unique (NFC) (#136469) 2025-04-19 20:33:53 -07:00
Lukacma
6c3adaafe3
[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate (#127043)
Currently arm_neon.h emits C-style casts to do vector type casts. This
relies on implicit conversion between vector types to be enabled, which
is currently deprecated behaviour and soon will disappear. To ensure
NEON code will keep working afterwards, this patch changes all this
vector type casts into bitcasts.


Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
2025-04-01 09:45:16 +01:00
Kazu Hirata
c6c394634c
[clang] Use *Set::insert_range (NFC) (#132507)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-22 08:06:38 -07:00
Kazu Hirata
69b70110b7
[TableGen] Avoid repeated hash lookups (NFC) (#132142) 2025-03-20 09:10:23 -07:00
Oliver Stannard
a619a2e53a
[ARM] Fix lane ordering for AdvSIMD intrinsics on big-endian targets (#127068)
In arm-neon.h, we insert shufflevectors around each intrinsic when the
target is big-endian, to compensate for the difference between the
ABI-defined memory format of vectors (with the whole vector stored as
one big-endian access) and LLVM's target-independent expectations (with
the lowest-numbered lane in the lowest address). However, this code was
written for the AArch64 ABI, and the AArch32 ABI differs slightly: it
requires that vectors are stored in memory as-if stored with VSTM, which
does a series of 64-bit accesses, instead of the AArch64 VSTR, which
does a single 128-bit access. This means that for AArch32 we need to
reverse the lanes in each 64-bit chunk of the vector, instead of in the
whole vector.

Since there are only a small number of different shufflevector orderings
needed, I've split them out into macros, so that this doesn't need
separate conditions in each intrinsic definition.
2025-03-04 08:10:22 +00:00
Chandler Carruth
64ea3f5a47 [StrTable] Switch AArch64 and ARM to use directly TableGen-ed builtin tables
This leverages the sharded structure of the builtins to make it easy to
directly tablegen most of the AArch64 and ARM builtins while still using
X-macros for a few edge cases. It also extracts common prefixes as part
of that.

This makes the string tables for these targets dramatically smaller.
This is especially important as the SVE builtins represent (by far) the
largest string table and largest builtin table across all the targets in
Clang.
2025-02-04 18:04:58 +00:00
Momchil Velikov
db6fa74dfe
[AArch64] Implement FP8 Neon reinterpret intrinsics (#120476) 2025-01-28 11:06:24 +00:00
Momchil Velikov
99bd2e3f12
[AArch64] Add Neon FP8 conversion intrinsics (#123612)
The patch adds the following intrinsics:

    bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm)
mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn,
float32x4_t vm, fpm_t fpm)
    
mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm)
mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t
fpm)

Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>
2025-01-27 17:32:47 +00:00
Momchil Velikov
87103a016f
[AArch64] Implement NEON FP8 vectors as VectorType (#123603)
Reimplement Neon FP8 vector types using attribute `neon_vector_type`
instead of having them as builtin types.
This allows to implement FP8 Neon intrinsics without the need to add
special cases for these types when using `__builtin_shufflevector`
or bitcast (using C-style cast operator) between vectors, both
extensively used in the generated code in `arm_neon.h`.
2025-01-27 10:41:53 +00:00
Momchil Velikov
dac49e8ddd
[Arm] Fix generating code with UB in NeonEmitter (#121802)
When generating `arm_neon.h`, NeonEmitter outputs code that
violates strict aliasing rules (C23 6.5 Expressions #7,
C++23 7.2.1 Value category [basic.lval] #11), for example:

    bfloat16_t __reint = __p0;
    uint32_t __reint1 = (uint32_t)(*(uint16_t *) &__reint) << 16;
    __ret = *(float32_t *) &__reint1;

This patch fixed the offending code by replacing it with
a call to `__builtin_bit_cast`.
2025-01-24 10:57:23 +00:00
CarolineConcatto
aaba8406c5
[NFC][Clang][AArch64]Refactor implementation of Neon vectors MFloat8… (#114804)
…x8 and MFloat8x16

This patch adds MFloat8 as a TypeFlag and Kind on Neon to generate the
typedefs using emitNeonTypeDefs.
It does not need any change in Clang, because SEMA and CodeGen use the
Builtins defined in AArch64SVEACLETypes.def
2024-11-21 10:29:28 +00:00
Rahul Joshi
63aa8cf6be
[NFC][Clang][TableGen] Fix file header comments (#116491) 2024-11-17 07:54:10 -08:00
CarolineConcatto
91aad9bfb2
[Clang][AArch64]Fix Name and Mangle name for scalar fp8 (#114983)
The scalar __mfp8 type has the wrong name and mangle name in
AArch64SVEACLETypes.def

According to the ACLE[1] the name should be __mfp8

This patch fixes this problem by replacing
the Name __MFloat8_t by __mfp8
and
the Mangle Name __MFloat8_t by u6__mfp8

And we revert the incorrect typedef in NeonEmitter.

[1]https://github.com/ARM-software/acle
2024-11-15 09:19:39 +00:00
Kazu Hirata
173529104d
[TableGen] Use heterogenous lookups with std::map (NFC) (#115682)
Heterogenous lookups allow us to call find with StringRef, avoiding a
temporary heap allocation of std::string.
2024-11-11 07:34:42 -08:00
Kazu Hirata
a44ee8ec1c
[TableGen] Use heterogenous lookups with std::map (NFC) (#115633)
Heterogenous lookups allow us to call find with StringRef, avoiding a
temporary heap allocation of std::string.
2024-11-10 07:24:27 -08:00
Momchil Velikov
1df5c94343
[AArch64] Implement FP8 floating-point mode helper intrinsics (#100608)
Implement FP8 mode helper intrinsics (as inline functions) as
specified in ACLE 2024Q3 "14.2 Helper intrinsics"

https://github.com/ARM-software/acle/releases/download/r2024Q3/acle-2024Q3.pdf
2024-10-28 11:22:38 +00:00
CarolineConcatto
49940514e2
[CLANG][AArch64] Add the modal 8 bit floating-point scalar type (#97277)
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point
intrinsic.

From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````

The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the
FPMR.

This patch is an attempt to the add the mfloat8_t scalar type. It has a
parser and codegen for the new scalar type.

The patch it is lowering to and 8bit unsigned as it has no format. But
maybe we should add another opaque type.

[1]  https://github.com/ARM-software/acle/pull/323
2024-10-25 13:59:46 +01:00
Jay Foad
4dd55c567a
[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399)
Follow up to #109133.
2024-10-24 10:23:40 +01:00
CarolineConcatto
6dad29aebc
[CLANG][AArch64]Add Neon vectors for mfloat8_t (#99865)
This patch adds these new vector sizes for neon:
   mfloat8x16_t and mfloat8x8_t

    According to the ARM ACLE PR#323[1].

    [1] ARM-software/acle#323
2024-10-23 13:23:18 +01:00
Rahul Joshi
9b422d14f3
[Clang][TableGen] Use const pointers for various Init objects in NeonEmitter (#112317)
Use const pointers for various Init objects in NeonEmitter. This is a
part of effort to have better const correctness in TableGen backends:

https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
2024-10-15 15:48:42 -07:00
Rahul Joshi
e9dbdb20f2
[Clang][TableGen] Change NeonEmitter to use const Record * (#110597)
This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
2024-10-01 10:47:09 -07:00
Rahul Joshi
0e948bfd31
[NFC][clang][TableGen] Remove redundant llvm:: namespace qualifier (#108627)
Remove llvm:: from .cpp files, and add "using namespace llvm" if needed.
2024-09-16 06:35:34 -07:00
Kazu Hirata
bae275f65e
[TableGen] Avoid repeated map lookups (NFC) (#108675) 2024-09-14 07:39:00 -07:00
Rahul Joshi
a4b1617368
[clang][TableGen] Change NeonEmitter to use const RecordKeeper (#108501)
Change NeonEmitter to use const RecordKeeper.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
2024-09-13 07:52:37 -07:00
Rahul Joshi
1651014960
[TableGen] Change SetTheory set/vec to use const Record * (#107692)
Change SetTheory::RecSet/RecVec to use const Record pointers.
2024-09-09 08:47:42 -07:00
SpencerAbson
1f70fcefa9
[Clang][AArch64] Add customisable immediate range checking to NEON (#100278)
This patch moves NEON immediate argument specification and checking to
the system currently shared by both SVE and SME.

In its current form, the TableGen definition of a NEON intrinsic cannot
control how its immediate arguments are range-checked, this information
must be inferred from the name of the intrinsic by NeonEmitter, which
also assumes that any NEON instruction will only ever receive a single
immediate argument. For SVE/SME instrinsics, this information is more
conveniently supplied in the TableGen definition.

As a result, for each immediate argument, NEON instructions must define
- The index of the immediate argument to be checked
- The type of immediate range check to be performed,
    (e.g., ImmCheckShiftRight)
- The index of the argument whose type defines the context
    of this immediate check (base type, vector size).
- **Difference from SVE/SME** If this definition generates a polymorphic
NEON builtin, the base type defined by this argument is overwritten by
that of the type code supplied to the overloaded builtin call. This
third argument is omitted in some cases due to this.

Here is an example for
[`vfma_laneq`](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vfma_laneq)
- The immediate is supplied in argument 3
- The immediate is used as an index into the lanes of argument 2
- So we must perform an immediate check on argument 3, based on the type
information of argument 2.
- `ImmCheck<3, ImmCheckLaneIndex, 2>`

During this work, we discovered that the existing immediate
range-checking system was largely untested, which made it difficult to
make reliable progress. Missing tests have been added to verify this
implementation against all intrinsics which take constrained immediate
arguments. All test immediate range checking tests for NEON intrinsics
are moved to a dedicated directory
`clang/test/Sema/aarch64-neon-immediate-ranges/`.
2024-09-06 13:12:37 +01:00
Rahul Joshi
d7da79f2cd
[NFC][SetTheory] Refactor to use const pointers and range loops (#105544)
- Refactor SetTheory code to use const pointers when possible.
- Use auto for variables initialized using dyn_cast<>.
- Use range based for loops and early continue.
2024-08-22 05:47:31 -07:00
Lukacma
0284b4b4b6
[Clang][NEON] Add neon target guard to intrinsics (#99870)
This patch improves reported error when NEON intrinsics are used without
neon target feature.
2024-07-22 14:21:31 +01:00
Lukacma
c1622cae10
Revert "[Clang][NEON] Add neon target guard to intrinsics" (#99864)
Reverts llvm/llvm-project#98624
2024-07-22 12:32:47 +01:00
Lukacma
dc82c774a7
[Clang][NEON] Add neon target guard to intrinsics (#98624)
This patch improves reported error when NEON intrinsics are used without
neon target feature.
2024-07-22 12:03:59 +01:00
Lukacma
8a46bbbc22
[Clang] Remove preprocessor guards and global feature checks for NEON (#95224)
To enable function multi-versioning (FMV), current checks which rely on
cmd line options or global macros to see if target feature is present
need to be removed. This patch removes those for NEON and also
implements changes to NEON header file as proposed in
[ACLE](https://github.com/ARM-software/acle/pull/321).
2024-06-25 17:19:42 +02:00
Eli Friedman
8c9f45e2de
[ARM64EC] Fix arm_neon.h on ARM64EC. (#88572)
Since 97fe519d, in ARM64EC mode, we don't define `__aarch64__`. Fix
various preprocessor guards to account for this.
2024-04-16 17:08:02 -07:00
Kazu Hirata
e6bafbe726 [TableGen] Use StringRef::consume_{front,back} (NFC) 2024-01-25 18:17:24 -08:00
Sam Tebbs
945c645acb
[AArch64][SME] Warn when using a streaming builtin from a non-streaming function (#75487)
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.

Uses work by Kerry McLaughlin.

This is a re-upload of #74064 and fixes a compile time increase.
2023-12-18 09:32:34 +00:00
Sam Tebbs
342384ca05
Revert "[AArch64][SME] Warn when using a streaming builtin from a non-streaming function" (#75449)
Reverts llvm/llvm-project#74064
2023-12-14 09:31:55 +00:00
Sam Tebbs
2e45326b08
[AArch64][SME] Warn when using a streaming builtin from a non-streaming function (#74064)
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.

Uses work by Kerry McLaughlin.
2023-12-14 00:11:04 +00:00
CarolineConcatto
ed2d497291
[Clang][AArch64] Add fix vector types to header into SVE (#73258)
This patch is needed for the reduction instructions in sve2.1
 It add a new header to sve with all the fixed vector types.
  The new types are only added if neon is not declared.
2023-12-13 08:59:41 +00:00
Simon Pilgrim
141122ece3 [TableGen] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-03 17:53:56 +00:00
Kazu Hirata
dd27036ff7 [TableGen] Modernize OverloadInfo (NFC) 2023-09-04 13:35:26 -07:00
Lucas Prates
2b7ac62606 [AArch64][RCPC3] Add Neon intrinsics for LDAP1 and STL1
This adds new intrisics to support the LDAP1 and STL1 Advanced SIMD
(Neon) instructions introduced as part of FEAT_LRCPC3.
The new intrinsics `vldap1(q)_lane`/`vstl1(q)_lane` generate IR code
similar to the existing `vld1(q)_lane/st1(q)_lane` ones, but capturing
the difference in the atomic release/acquire memory model.

The LLVM code generation changes to ensure that this instruction pair
is lowered to the correct LDAP1/STL1 instructions will be covered in a
separate commit.

Based on a patch by Sam Elliott.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D153128
2023-07-07 12:31:55 +01:00
Dimitry Andric
db49231639 [clang][BFloat] Avoid redefining bfloat16_t in arm_neon.h
As of https://reviews.llvm.org/D79708, clang-tblgen generates `arm_neon.h`,
`arm_sve.h` and `arm_bf16.h`, and all those generated files will contain a
typedef of `bfloat16_t`. However, `arm_neon.h` and `arm_sve.h` include
`arm_bf16.h` immediately before their own typedef:

    #include <arm_bf16.h>
    typedef __bf16 bfloat16_t;

With a recent version of clang (I used 16.0.1) this results in warnings:

    /usr/lib/clang/16/include/arm_neon.h:38:16: error: redefinition of typedef 'bfloat16_t' is a C11 feature [-Werror,-Wtypedef-redefinition]

Since `arm_bf16.h` is very likely supposed to be the one true place where
`bfloat16_t` is defined, I propose to delete the duplicate typedefs from the
generated `arm_neon.h` and `arm_sve.h`.

Reviewed By: sdesmalen, simonbutcher

Differential Revision: https://reviews.llvm.org/D148822
2023-05-03 17:54:58 +02:00
Manna, Soumi
38ecb9767c [NFC][clang] Fix Coverity bugs with AUTO_CAUSES_COPY
Reported by Coverity:
AUTO_CAUSES_COPY
Unnecessary object copies can affect performance.

1. Inside "ExtractAPIVisitor.h" file, in clang::extractapi::impl::ExtractAPIVisitorBase<<unnamed>::BatchExtractAPIVisitor>::VisitFunctionDecl(clang::FunctionDecl const *): Using the auto keyword without an & causes the copy of an object of type DynTypedNode.

2. Inside "NeonEmitter.cpp" file, in <unnamed>::Intrinsic::Intrinsic(llvm::Record *, llvm::StringRef, llvm::StringRef, <unnamed>::TypeSpec, <unnamed>::TypeSpec, <unnamed>::ClassKind, llvm::ListInit *, <unnamed>::NeonEmitter &, llvm::StringRef, llvm::StringRef, bool, bool): Using the auto keyword without an & causes the copy of an object of type Type.

3. Inside "MicrosoftCXXABI.cpp" file, in <unnamed>::MSRTTIBuilder::getClassHierarchyDescriptor(): Using the auto keyword without an & causes the copy of an object of type MSRTTIClass.

4. Inside "CGGPUBuiltin.cpp" file, in clang::CodeGen::CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(clang::CallExpr const *): Using the auto keyword without an & causes the copy of an object of type CallArg.

5. Inside "SemaDeclAttr.cpp" file, in threadSafetyCheckIsSmartPointer(clang::Sema &, clang::RecordType const *): Using the auto keyword without an & causes the copy of an object of type CXXBaseSpecifier.

6. Inside "ComputeDependence.cpp" file, in clang::computeDependence(clang::DesignatedInitExpr *): Using the auto keyword without an & causes the copy of an object of type Designator.

7. Inside "Format.cpp" file, In clang::format::affectsRange(llvm::ArrayRef<clang::tooling::Range>, unsigned int, unsigned int): Using the auto keyword without an & causes the copy of an object of type Range.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D149074
2023-04-24 14:52:55 -07:00
Kazu Hirata
9cf4419e24 [clang] Use std::optional instead of llvm::Optional (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-02 15:54:57 -08:00
Kazu Hirata
f7dffc28b3 Don't include None.h (NFC)
I've converted all known uses of None to std::nullopt, so we no longer
need to include None.h.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-10 11:24:26 -08:00