Issue originally raised in
https://github.com/llvm/llvm-project/issues/71362#issuecomment-3028515618.
Certain NEON intrinsics that operate on poly types (e.g. poly8x8_t)
failed to compile with the -fno-lax-vector-conversions flag. This patch
updates NeonEmitter.cpp to insert an explicit __builtin_bit_cast from
poly types to the required signed integer vector types when generating
lane-related intrinsics. A test 'neon-bitcast-poly.ll' is included.
The footgun here was that the preprocessor diagnostic that looks for
__ARM_FP would fire when included on targets like x86_64, but the
suggestion it gives in that case is totally bogus. Avoid giving bad
advice, by first checking whether we're being built for an appropriate
target, and only then do the soft-fp check.
rdar://155449666
Rename `ListInit::getValues()` to `getElements()` to better match with
other `ListInit` members like `getElement`. Keep `getValues()` for
existing downstream code but mark it deprecated.
This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.
It also changes mfloat8 type representation in memory from `i8` to
`<1xi8>`
In Record only store the direct superclasses instead of all
superclasses. getSuperClasses recurses to find all superclasses when
necessary.
This gives a small reduction in memory usage. On lib/Target/X86/X86.td I
measured about 2.0% reduction in total bytes allocated (measured by
valgrind) and 1.3% reduction in peak memory usage (measured by
/usr/bin/time -v).
---------
Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
Currently arm_neon.h emits C-style casts to do vector type casts. This
relies on implicit conversion between vector types to be enabled, which
is currently deprecated behaviour and soon will disappear. To ensure
NEON code will keep working afterwards, this patch changes all this
vector type casts into bitcasts.
Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
In arm-neon.h, we insert shufflevectors around each intrinsic when the
target is big-endian, to compensate for the difference between the
ABI-defined memory format of vectors (with the whole vector stored as
one big-endian access) and LLVM's target-independent expectations (with
the lowest-numbered lane in the lowest address). However, this code was
written for the AArch64 ABI, and the AArch32 ABI differs slightly: it
requires that vectors are stored in memory as-if stored with VSTM, which
does a series of 64-bit accesses, instead of the AArch64 VSTR, which
does a single 128-bit access. This means that for AArch32 we need to
reverse the lanes in each 64-bit chunk of the vector, instead of in the
whole vector.
Since there are only a small number of different shufflevector orderings
needed, I've split them out into macros, so that this doesn't need
separate conditions in each intrinsic definition.
This leverages the sharded structure of the builtins to make it easy to
directly tablegen most of the AArch64 and ARM builtins while still using
X-macros for a few edge cases. It also extracts common prefixes as part
of that.
This makes the string tables for these targets dramatically smaller.
This is especially important as the SVE builtins represent (by far) the
largest string table and largest builtin table across all the targets in
Clang.
Reimplement Neon FP8 vector types using attribute `neon_vector_type`
instead of having them as builtin types.
This allows to implement FP8 Neon intrinsics without the need to add
special cases for these types when using `__builtin_shufflevector`
or bitcast (using C-style cast operator) between vectors, both
extensively used in the generated code in `arm_neon.h`.
When generating `arm_neon.h`, NeonEmitter outputs code that
violates strict aliasing rules (C23 6.5 Expressions #7,
C++23 7.2.1 Value category [basic.lval] #11), for example:
bfloat16_t __reint = __p0;
uint32_t __reint1 = (uint32_t)(*(uint16_t *) &__reint) << 16;
__ret = *(float32_t *) &__reint1;
This patch fixed the offending code by replacing it with
a call to `__builtin_bit_cast`.
…x8 and MFloat8x16
This patch adds MFloat8 as a TypeFlag and Kind on Neon to generate the
typedefs using emitNeonTypeDefs.
It does not need any change in Clang, because SEMA and CodeGen use the
Builtins defined in AArch64SVEACLETypes.def
The scalar __mfp8 type has the wrong name and mangle name in
AArch64SVEACLETypes.def
According to the ACLE[1] the name should be __mfp8
This patch fixes this problem by replacing
the Name __MFloat8_t by __mfp8
and
the Mangle Name __MFloat8_t by u6__mfp8
And we revert the incorrect typedef in NeonEmitter.
[1]https://github.com/ARM-software/acle
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point
intrinsic.
From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````
The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the
FPMR.
This patch is an attempt to the add the mfloat8_t scalar type. It has a
parser and codegen for the new scalar type.
The patch it is lowering to and 8bit unsigned as it has no format. But
maybe we should add another opaque type.
[1] https://github.com/ARM-software/acle/pull/323
This patch moves NEON immediate argument specification and checking to
the system currently shared by both SVE and SME.
In its current form, the TableGen definition of a NEON intrinsic cannot
control how its immediate arguments are range-checked, this information
must be inferred from the name of the intrinsic by NeonEmitter, which
also assumes that any NEON instruction will only ever receive a single
immediate argument. For SVE/SME instrinsics, this information is more
conveniently supplied in the TableGen definition.
As a result, for each immediate argument, NEON instructions must define
- The index of the immediate argument to be checked
- The type of immediate range check to be performed,
(e.g., ImmCheckShiftRight)
- The index of the argument whose type defines the context
of this immediate check (base type, vector size).
- **Difference from SVE/SME** If this definition generates a polymorphic
NEON builtin, the base type defined by this argument is overwritten by
that of the type code supplied to the overloaded builtin call. This
third argument is omitted in some cases due to this.
Here is an example for
[`vfma_laneq`](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vfma_laneq)
- The immediate is supplied in argument 3
- The immediate is used as an index into the lanes of argument 2
- So we must perform an immediate check on argument 3, based on the type
information of argument 2.
- `ImmCheck<3, ImmCheckLaneIndex, 2>`
During this work, we discovered that the existing immediate
range-checking system was largely untested, which made it difficult to
make reliable progress. Missing tests have been added to verify this
implementation against all intrinsics which take constrained immediate
arguments. All test immediate range checking tests for NEON intrinsics
are moved to a dedicated directory
`clang/test/Sema/aarch64-neon-immediate-ranges/`.
- Refactor SetTheory code to use const pointers when possible.
- Use auto for variables initialized using dyn_cast<>.
- Use range based for loops and early continue.
To enable function multi-versioning (FMV), current checks which rely on
cmd line options or global macros to see if target feature is present
need to be removed. This patch removes those for NEON and also
implements changes to NEON header file as proposed in
[ACLE](https://github.com/ARM-software/acle/pull/321).
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.
Uses work by Kerry McLaughlin.
This is a re-upload of #74064 and fixes a compile time increase.
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.
Uses work by Kerry McLaughlin.
This patch is needed for the reduction instructions in sve2.1
It add a new header to sve with all the fixed vector types.
The new types are only added if neon is not declared.
This adds new intrisics to support the LDAP1 and STL1 Advanced SIMD
(Neon) instructions introduced as part of FEAT_LRCPC3.
The new intrinsics `vldap1(q)_lane`/`vstl1(q)_lane` generate IR code
similar to the existing `vld1(q)_lane/st1(q)_lane` ones, but capturing
the difference in the atomic release/acquire memory model.
The LLVM code generation changes to ensure that this instruction pair
is lowered to the correct LDAP1/STL1 instructions will be covered in a
separate commit.
Based on a patch by Sam Elliott.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D153128
As of https://reviews.llvm.org/D79708, clang-tblgen generates `arm_neon.h`,
`arm_sve.h` and `arm_bf16.h`, and all those generated files will contain a
typedef of `bfloat16_t`. However, `arm_neon.h` and `arm_sve.h` include
`arm_bf16.h` immediately before their own typedef:
#include <arm_bf16.h>
typedef __bf16 bfloat16_t;
With a recent version of clang (I used 16.0.1) this results in warnings:
/usr/lib/clang/16/include/arm_neon.h:38:16: error: redefinition of typedef 'bfloat16_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
Since `arm_bf16.h` is very likely supposed to be the one true place where
`bfloat16_t` is defined, I propose to delete the duplicate typedefs from the
generated `arm_neon.h` and `arm_sve.h`.
Reviewed By: sdesmalen, simonbutcher
Differential Revision: https://reviews.llvm.org/D148822
Reported by Coverity:
AUTO_CAUSES_COPY
Unnecessary object copies can affect performance.
1. Inside "ExtractAPIVisitor.h" file, in clang::extractapi::impl::ExtractAPIVisitorBase<<unnamed>::BatchExtractAPIVisitor>::VisitFunctionDecl(clang::FunctionDecl const *): Using the auto keyword without an & causes the copy of an object of type DynTypedNode.
2. Inside "NeonEmitter.cpp" file, in <unnamed>::Intrinsic::Intrinsic(llvm::Record *, llvm::StringRef, llvm::StringRef, <unnamed>::TypeSpec, <unnamed>::TypeSpec, <unnamed>::ClassKind, llvm::ListInit *, <unnamed>::NeonEmitter &, llvm::StringRef, llvm::StringRef, bool, bool): Using the auto keyword without an & causes the copy of an object of type Type.
3. Inside "MicrosoftCXXABI.cpp" file, in <unnamed>::MSRTTIBuilder::getClassHierarchyDescriptor(): Using the auto keyword without an & causes the copy of an object of type MSRTTIClass.
4. Inside "CGGPUBuiltin.cpp" file, in clang::CodeGen::CodeGenFunction::EmitAMDGPUDevicePrintfCallExpr(clang::CallExpr const *): Using the auto keyword without an & causes the copy of an object of type CallArg.
5. Inside "SemaDeclAttr.cpp" file, in threadSafetyCheckIsSmartPointer(clang::Sema &, clang::RecordType const *): Using the auto keyword without an & causes the copy of an object of type CXXBaseSpecifier.
6. Inside "ComputeDependence.cpp" file, in clang::computeDependence(clang::DesignatedInitExpr *): Using the auto keyword without an & causes the copy of an object of type Designator.
7. Inside "Format.cpp" file, In clang::format::affectsRange(llvm::ArrayRef<clang::tooling::Range>, unsigned int, unsigned int): Using the auto keyword without an & causes the copy of an object of type Range.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D149074