Only set a target guard if it deviates from its default value[1].
When a target guard is set, it is automatically AND'd with its default
value. This means there is no need to use SVETargetGuard="sve,bf16"
because SVETargetGuard="bf16" is sufficient.
[1] Defaults: SVETargetGuard="sve", SMETargetGuard="sme"
Rather than filtering the calling function's features the PR splits the
builtin guard into distinct non-streaming and streaming guards that are
compared to the active features in full.
Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files.
This PR depends on https://github.com/llvm/llvm-project/pull/127797
This patch updates the semantics of template arguments in intrinsic
names for clarity and ease of use. Previously, template argument numbers
indicated which character in the prototype string determined the final
type suffix, which was confusing—especially for intrinsics using
multiple prototype modifiers per operand (e.g., intrinsics operating on
arrays of vectors). The number had to reference the correct character in
the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and
error-prone.
With this patch, template argument numbers now refer to the operand
number that determines the final type suffix, providing a more intuitive
and consistent approach.
This leverages the sharded structure of the builtins to make it easy to
directly tablegen most of the AArch64 and ARM builtins while still using
X-macros for a few edge cases. It also extracts common prefixes as part
of that.
This makes the string tables for these targets dramatically smaller.
This is especially important as the SVE builtins represent (by far) the
largest string table and largest builtin table across all the targets in
Clang.
- The FP8 scalar type (`__mfp8`) was described as a vector type
- The FP8 vector types were described/assumed to have integer element
type (the element type ought to be `__mfp8`)
- Add support for `m` type specifier (denoting `__mfp8`) in
`DecodeTypeFromStr` and create builtin function prototypes using that
specifier, instead of `int8_t`
Replacing the extant streaming mode function call with an intrinsic
allows us to make further optimisations around it. For example, if it's
called within a function that has a known streaming mode, we can remove
the dead code, and avoid the redundant conditional branch.
- Switch to an enumerated type approach, which is less error-prone as we
continue to add new types. This is similar to NeonEmitter.
- Fix existing faulty typespec modifiers
This patch implements the following intrinsics:
8-bit floating-point convert to deinterleaved half-precision or
BFloat16.
``` c
// Variant is also available for: _bf16[_mf8]_x2
svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```
Defined in https://github.com/ARM-software/acle/pull/323
Co-authored-by: Caroline Concatto caroline.concatto@arm.com
Co-authored-by: Marian Lukac marian.lukac@arm.com
The implementation made the assumption that any feature starting with
"sve" meant that this was an SVE feature. This is not the case for
"sve-b16b16", as this is a feature that applies to both SVE and SME.
This meant that:
```
__attribute__((target("+sme2,+sve2,+sve-b16b16")))
svbfloat16_t foo(svbfloat16_t a, svbfloat16_t b, svbfloat16_t c)
__arm_streaming {
return svclamp_bf16(a, b, c);
}
```
would result in an incorrect diagnostic saying that `svclamp_bf16` could
only be used in non-streaming functions.
This patch moves NEON immediate argument specification and checking to
the system currently shared by both SVE and SME.
In its current form, the TableGen definition of a NEON intrinsic cannot
control how its immediate arguments are range-checked, this information
must be inferred from the name of the intrinsic by NeonEmitter, which
also assumes that any NEON instruction will only ever receive a single
immediate argument. For SVE/SME instrinsics, this information is more
conveniently supplied in the TableGen definition.
As a result, for each immediate argument, NEON instructions must define
- The index of the immediate argument to be checked
- The type of immediate range check to be performed,
(e.g., ImmCheckShiftRight)
- The index of the argument whose type defines the context
of this immediate check (base type, vector size).
- **Difference from SVE/SME** If this definition generates a polymorphic
NEON builtin, the base type defined by this argument is overwritten by
that of the type code supplied to the overloaded builtin call. This
third argument is omitted in some cases due to this.
Here is an example for
[`vfma_laneq`](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vfma_laneq)
- The immediate is supplied in argument 3
- The immediate is used as an index into the lanes of argument 2
- So we must perform an immediate check on argument 3, based on the type
information of argument 2.
- `ImmCheck<3, ImmCheckLaneIndex, 2>`
During this work, we discovered that the existing immediate
range-checking system was largely untested, which made it difficult to
make reliable progress. Missing tests have been added to verify this
implementation against all intrinsics which take constrained immediate
arguments. All test immediate range checking tests for NEON intrinsics
are moved to a dedicated directory
`clang/test/Sema/aarch64-neon-immediate-ranges/`.
One reason to want to split this up is to simplify the code added in
#93802, where it checks the SME streaming-mode requirements for a
builtin by checking for the absence of SVE. If the target guards are
separate, we can generate a table and make the Sema code to verify the
runtime mode simpler.
Another reason is to avoid an issue with a check in SveEmitter.cpp where
it ensures that the 'VerifyRuntimeMode' is set correctly for functions
that have both SVE and SME target guards:
if (!Def->isFlagSet(VerifyRuntimeMode) &&
Def->getGuard().contains("sve") &&
Def->getGuard().contains("sme"))
llvm_unreachable("Missing VerifyRuntimeMode flag");
However, if we ever add a new feature with "sme" in the name, even
though it is unrelated to FEAT_SME, then this code no longer works.
Note that the arm_sve.td and arm_sme.td files could do with a bit of
restructuring after this but it seems better to follow that up in an NFC
patch.
PR #76975 added 'IsStreamingOrSVE2p1' to emit a diagnostic when a builtin marked
with 'IsStreamingOrSVE2p1' is used in a non-streaming function that is not
compiled with `+sve2p1`.
The problem is a bit more complex than only this case. For example, we've marked
lots of builtins with 'IsStreamingCompatible', meaning it can be used in either
streaming, streaming-compatible or non-streaming functions. But the code in
SemaChecking, doesn't check the appropriate target guards. This issue becomes
relevant when SVE builtins are only available in streaming mode, e.g. when
compiling for SME without SVE.
If we were to add the appropriate target guards, we'd have to add many more
combinations, e.g.:
IsStreamingSMEOrSVE
IsStreamingSME2OrSVE2
IsStreamingSMEOrSVE2p1
IsStreamingSME2OrSVE2p1
etc.
To avoid having to add more combinations (and avoid having to add more in the
future for new extensions), we use a single 'IsSVEOrStreamingSVE' flag for all
builtins that are available in streaming mode for the appropriate SME flags, or
in non-streaming mode for the appropriate SVE flags, or both. The code in
SemaChecking will then verify for which mode (or both) the builtin would be
defined, given the target features of the function/compilation unit.
For example:
'svclamp' is enabled under FEAT_SVE2p1 and FEAT_SME2
* When we compile for SVE2p1 and SME (but not SME2), the builtin is undefined
behaviour when called from a streaming function.
* When we compile for SME2 and SVE2 (but not SVE2p1), the builtin is undefined
behaviour when called from a non-streaming function.
* When we compile for _both_ SVE2p1 and SME2, the builtin can be used in either
mode (non-streaming, streaming or streaming-compatible)
The intrinsics are currently defined as:
```
__aio __attribute__((target("sve")))
svint8_t svreinterpret_s8(svuint8_t op) __arm_streaming_compatible {
return __builtin_sve_reinterpret_s8_u8(op);
}
```
which doesn't work when calling it from an __arm_streaming function when
only +sme is available. By defining it in the same way as we've defined
all the other intrinsics, we can leave it to the code in SemaChecking to
verify that either +sve or +sme is available.
This PR also fixes the target guards for the svreinterpret_c and
svreinterpret_b intrinsics, that convert between svcount_t and svbool_t,
as these are available both in SME2 and SVE2p1.
The arm_sme.td file was still using `IsSharedZA` and `IsPreservesZA`,
which should be changed to match the new state attributes added in
#76971.
This patch adds `IsInZA`, `IsOutZA` and `IsInOutZA` as the state for the
Clang builtins and fixes up the code in SemaChecking and SveEmitter to
match.
Note that the code is written in such a way that it can be easily
extended with ZT0 state (to follow in a future patch).
These attributes were using the GNU attribute syntax, rather than the new
keyword attribute syntax, and they are no longer required as we have code
in SemaChecking to verify whether a builtin is compatible with its caller.
This patch replaces the `__arm_new_za`, `__arm_shared_za` and
`__arm_preserves_za` attributes in favour of:
* `__arm_new("za")`
* `__arm_in("za")`
* `__arm_out("za")`
* `__arm_inout("za")`
* `__arm_preserves("za")`
As described in https://github.com/ARM-software/acle/pull/276.
One change is that `__arm_in/out/inout/preserves(S)` are all mutually
exclusive, whereas previously it was fine to write `__arm_shared_za
__arm_preserves_za`. This case is now represented with `__arm_in("za")`.
The current implementation uses the same LLVM attributes under the hood,
since `__arm_in/out/inout` are all variations of "shared ZA", so can use
the existing `aarch64_pstate_za_shared` attribute in LLVM.
#77941 will add support for the new "zt0" state as introduced
with SME2.
This patch adds a warning that's emitted when a builtin call uses ZA
state but the calling function doesn't provide any.
Patch by David Sherwood <david.sherwood@arm.com>.
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.
Uses work by Kerry McLaughlin.
This is a re-upload of #74064 and fixes a compile time increase.
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.
Uses work by Kerry McLaughlin.
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:
// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);
// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);
According to the PR#257[1]
The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.
[1]https://github.com/ARM-software/acle/pull/257
Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
This patch is needed for the reduction instructions in sve2.1
It add a new header to sve with all the fixed vector types.
The new types are only added if neon is not declared.