The arm_sme.td file was still using `IsSharedZA` and `IsPreservesZA`,
which should be changed to match the new state attributes added in
#76971.
This patch adds `IsInZA`, `IsOutZA` and `IsInOutZA` as the state for the
Clang builtins and fixes up the code in SemaChecking and SveEmitter to
match.
Note that the code is written in such a way that it can be easily
extended with ZT0 state (to follow in a future patch).
These attributes were using the GNU attribute syntax, rather than the new
keyword attribute syntax, and they are no longer required as we have code
in SemaChecking to verify whether a builtin is compatible with its caller.
This patch replaces the `__arm_new_za`, `__arm_shared_za` and
`__arm_preserves_za` attributes in favour of:
* `__arm_new("za")`
* `__arm_in("za")`
* `__arm_out("za")`
* `__arm_inout("za")`
* `__arm_preserves("za")`
As described in https://github.com/ARM-software/acle/pull/276.
One change is that `__arm_in/out/inout/preserves(S)` are all mutually
exclusive, whereas previously it was fine to write `__arm_shared_za
__arm_preserves_za`. This case is now represented with `__arm_in("za")`.
The current implementation uses the same LLVM attributes under the hood,
since `__arm_in/out/inout` are all variations of "shared ZA", so can use
the existing `aarch64_pstate_za_shared` attribute in LLVM.
#77941 will add support for the new "zt0" state as introduced
with SME2.
This patch adds a warning that's emitted when a builtin call uses ZA
state but the calling function doesn't provide any.
Patch by David Sherwood <david.sherwood@arm.com>.
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.
Uses work by Kerry McLaughlin.
This is a re-upload of #74064 and fixes a compile time increase.
This PR adds a warning that's emitted when a non-streaming or
non-streaming-compatible builtin is called in an unsuitable function.
Uses work by Kerry McLaughlin.
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:
// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);
// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);
According to the PR#257[1]
The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.
[1]https://github.com/ARM-software/acle/pull/257
Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
This patch is needed for the reduction instructions in sve2.1
It add a new header to sve with all the fixed vector types.
The new types are only added if neon is not declared.
Adds the following SME2 builtins:
- sv(add|sub)
- sv(add|sub)_za32/za64,
- sv(add|sub)_write_za32/za64
Other changes in this patch:
- CGBuiltin.cpp: The GetAArch64SMEProcessedOperands function is created
to avoid duplicating existing code from EmitAArch64SVEBuiltinExpr.
- arm_sve.td: The add/sub SME2 builtins which do not operate on ZA have
been added to arm_sve.td, matching the corrosponding LLVM IR intrinsic
names which start with @llvm.aarch64.sve for this reason.
- SveEmitter.cpp: Adds the createCoreHeaderIntrinsics function to remove
duplicated code in createHeader & createSMEHeader. Uses a new enum
(ACLEKind) to choose either "__builtin_sme_" or "__builtin_sve_" when
emitting the intrinsics.
See https://github.com/ARM-software/acle/pull/217/files
This patch adds reinterpret builtins as proposed here:
https://github.com/ARM-software/acle/pull/275.
The builtins take the form:
sv<dst>x<N>_t svreinterpret_<dst>_<src>_x<N>(sv<src>x<N>_t op)
where
- <src> and <dst> designate the source and the destination type,
respectively, all pairs chosen from {s8, u8, s16, u8, s32, u32, s64,
u64, bf16, f16, f32, f64}
- <N> designated the number of tuple elements, 2, 3 or 4
A short (overloaded) for is also provided, where the destination type is
explicitly designated and the source type is deduced from the parameter
type. These take the form
sv<dst>x<N>_t svreinterpret_<dst>(sv<src>x<N>_t op)
For example:
svuin16x2_t svreinterpret_u16_s32_x2(svint32x2_t op);
svuin16x2_t svreinterpret_u16(svint32x2_t op);
* Mark SVE ACLE types as substitution candidates.
* Change mangling of svbfloat16_t from __SVBFloat16_t to
__SVBfloat16_t.
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
This is an ABI break with the old behaviour available via
"-fclang-abi-compat=17".
Otherwise these functions are not inlined when invoked from streaming
functions.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D159188
This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):
- svread_hor_za8[_s8]_m // also for u8
- svread_hor_za16[_s16]_m // also for u16, f16, bf16
- svread_hor_za32[_s32]_m // also for u32, f32
- svread_hor_za64[_s64]_m // also for u64, f64
- svread_hor_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
- svread_ver_za8[_s8]_m // also for u8
- svread_ver_za16[_s16]_m // also for u16, f16, bf16
- svread_ver_za32[_s32]_m // also for u32, f32
- svread_ver_za64[_s64]_m // also for u64, f64
- svread_ver_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
- svwrite_hor_za8[_s8]_m // also for u8
- svwrite_hor_za16[_s16]_m // also for u16, f16, bf16
- svwrite_hor_za32[_s32]_m // also for u32, f32
- svwrite_hor_za64[_s64]_m // also for u64, f64
- svwrite_hor_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
- svwrite_ver_za8[_s8]_m // also for u8
- svwrite_ver_za16[_s16]_m // also for u16, f16, bf16
- svwrite_ver_za32[_s32]_m // also for u32, f32
- svwrite_ver_za64[_s64]_m // also for u64, f64
- svwrite_ver_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>
Reviewed By: sdesmalen, kmclaughlin
Differential Revision: https://reviews.llvm.org/D128648
This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):
- svld1_hor_za8 // also for _za16, _za32, _za64 and _za128
- svld1_hor_vnum_za8 // also for _za16, _za32, _za64 and _za128
- svld1_ver_za8 // also for _za16, _za32, _za64 and _za128
- svld1_ver_vnum_za8 // also for _za16, _za32, _za64 and _za128
- svst1_hor_za8 // also for _za16, _za32, _za64 and _za128
- svst1_hor_vnum_za8 // also for _za16, _za32, _za64 and _za128
- svst1_ver_za8 // also for _za16, _za32, _za64 and _za128
- svst1_ver_vnum_za8 // also for _za16, _za32, _za64 and _za128
SveEmitter.cpp is extended to generate arm_sme.h (currently named
arm_sme_draft_spec_subject_to_change.h) and other SME definitions from
arm_sme.td, which is modeled after arm_sve.td. Common TableGen definitions
are moved into arm_sve_sme_incl.td.
Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>
Reviewed By: sdesmalen, kmclaughlin
Differential Revision: https://reviews.llvm.org/D127910
Reported by Static Analyzer Tool, Coverity:
Bad bit shift operation
The operation may have an undefined behavior or yield an unexpected result.
In <unnamed>::SVEEmitter::encodeFlag(unsigned long long, llvm::StringRef): A bit shift operation has a shift amount which is too large or has a negative value.
// Returns the SVETypeFlags for a given value and mask.
uint64_t encodeFlag(uint64_t V, StringRef MaskName) const {
auto It = FlagTypes.find(MaskName);
//Condition It != llvm::StringMap<unsigned long long, llvm::MallocAllocator>::const_iterator const(this->FlagTypes.end()), taking true branch.
if (It != FlagTypes.end()) {
uint64_t Mask = It->getValue();
//return_constant: Function call llvm::countr_zero(Mask) may return 64.
//assignment: Assigning: Shift = llvm::countr_zero(Mask). The value of Shift is now 64.
unsigned Shift = llvm::countr_zero(Mask);
//Bad bit shift operation (BAD_SHIFT)
//large_shift: In expression V << Shift, left shifting by more than 63 bits has undefined behavior. The shift amount, Shift, is 64.
return (V << Shift) & Mask;
}
llvm_unreachable("Unsupported flag");
}
Asserting Mask != 0 will not suffice to silence Coverity. While Coverity can specifically observe that countr_zero might return 0 (because TrailingZerosCounter<T, 8>::count() has a return 64 statement), It seems like Coverity can not determine that the function can't return 65 or higher. Coverity is reporting is that the shift might overflow,
so that is what should be guarded.
assert(Shift < 64 && "Mask value produced an invalid shift value");
Reviewed By: tahonermann, sdesmalen, erichkeane
Differential Revision: https://reviews.llvm.org/D150140
As of https://reviews.llvm.org/D79708, clang-tblgen generates `arm_neon.h`,
`arm_sve.h` and `arm_bf16.h`, and all those generated files will contain a
typedef of `bfloat16_t`. However, `arm_neon.h` and `arm_sve.h` include
`arm_bf16.h` immediately before their own typedef:
#include <arm_bf16.h>
typedef __bf16 bfloat16_t;
With a recent version of clang (I used 16.0.1) this results in warnings:
/usr/lib/clang/16/include/arm_neon.h:38:16: error: redefinition of typedef 'bfloat16_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
Since `arm_bf16.h` is very likely supposed to be the one true place where
`bfloat16_t` is defined, I propose to delete the duplicate typedefs from the
generated `arm_neon.h` and `arm_sve.h`.
Reviewed By: sdesmalen, simonbutcher
Differential Revision: https://reviews.llvm.org/D148822
This patch makes SVE intrinsics more useable by gating them on the
target, not by ifdef preprocessor macros. See #56480. This alters the
SVEEmitter for arm_sve.h to remove the #ifdef guards and instead use
TARGET_BUILTIN with the correct features so that the existing "'func'
needs target feature sve" error will be generated when sve is not
present.
The ArchGuard containing defines in the SVEEmitter are changed to
TargetGuard containing target features. In the arm_neon.h emitter there
are both existing ArchGuard ifdefs mixed with new TargetGuard target
feature guards, so the name is change in the SVE too for consistency.
The few functions that are present in arm_sve.h (as opposed to builtin
aliases) have __attribute__((target("sve"))) added. Some of the tests
needed to be rejigged a little, as well as updating the error message,
as the error now happens at a later point.
Differential Revision: https://reviews.llvm.org/D131064
arm_sve.h defines and uses __ai macro which needs to be undefined (as it is
already in arm_neon.h).
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D131580
This fixes a subtle issue where:
svprf(pg, ptr, SV_ALL /*is sv_pattern instead of sv_prfop*/)
would be quietly accepted. With this change, the function declaration
guards that the third parameter is a `enum sv_prfop`. Previously `svprf`
would map directly to `__builtin_sve_svprfb`, which accepts the enum
operand as a signed integer and only checks that the incoming range is
valid, meaning that SV_ALL would be discarded as being outside the valid
immediate range, but would have allowed SV_VL1 without issuing a warning
(C) or error (C++).
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D100297
This patch changes the builtin prototype to use 'b' (boolean) instead
of the default integer element type. That fixes the dup/dupq intrinsics
when compiling with C++.
This patch also fixes one of the defines for __ARM_FEATURE_SVE2_BITPERM.
Reviewed By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D100294
The inline keyword is not defined in the C89 standard, so source files
that include arm_sve.h will fail compilation if -std=c89 is specified.
For consistency with arm_neon.h, we should use __inline__ instead.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D96852
Adapt the declarations of `svpattern` and `svprfop` to the most recent
one defined in section "5. Enum declarations" of the SVE ACLE
specifications [1].
The signature of the intrinsics using these enums have been changed
accordingly.
A test has been added to make sure that `svpattern` and `svprfop` are
not typedefs.
[1] https://developer.arm.com/documentation/100987/latest, version
00bet6
Reviewed By: joechrisellis
Differential Revision: https://reviews.llvm.org/D91333
bfloat16 variants of svdup_lane were missing, and svcvtnt_bf16_x
was implemented incorrectly (it takes an operand for the inactive
lanes)
Reviewers: fpetrogalli, efriedma
Reviewed By: fpetrogalli
Tags: #clang
Differential Revision: https://reviews.llvm.org/D82908
The original patch was reverted in
ff5ccf258e
as it was missing the C tests that got accidentally missing.
This patch is a NFC of https://reviews.llvm.org/D82501, together with
the SVE ACLE tests for the C intrinsics of svreinterpret for brain
float types.
This reverts commit a15722c5ce4759c12960fe434ee6bd8aac70bb16.
The commmit has to be reverted because I accidentally submit
https://reviews.llvm.org/D82501 without the C tests that were added in
an early version of the patch.
This patch contains:
- Support in LLVM CodeGen for bfloat16 types for ld2/3/4 and st2/3/4.
- New bfloat16 ACLE builtins for svld(2|3|4)[_vnum] and svst(2|3|4)[_vnum]
Reviewers: stuij, efriedma, c-rhodes, fpetrogalli
Reviewed By: fpetrogalli
Tags: #clang, #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D82187
Summary:
svbfloat16_t should only be defined if the __ARM_FEATURE_SVE_BF16
feature macro is enabled, similar to the scalar bfloat16_t type. Also,
arm_bf16.h should be included in arm_sve.h when
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC is defined.
Patch also contains a fix for ld1ro intrinsic which should be guarded on
__ARM_FEATURE_SVE_BF16 rather than __ARM_FEATURE_BF16_SCALAR_ARITHMETIC,
and a fix for bfmmla test which was missing
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC and -target-feature +bf16 in the
RUN line.
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D82178
Summary:
The new SVE builtin type __SVBFloat16_t` is used to represent scalable
vectors of bfloat elements.
Reviewers: sdesmalen, efriedma, stuij, ctetreau, shafik, rengolin
Subscribers: tschuett, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81304
The AAPCS specifies that the tuple types such as `svint32x2_t`
should use their `arm_sve.h` names when mangled instead of their
builtin names.
This patch also renames the internal types for the tuples to
be prefixed with `__clang_`, so they are not misinterpreted as
specified internal types like the non-tuple types which *are* defined
in the AAPCS. Using a builtin type for the tuples is a purely
a choice of the Clang implementation.
Reviewers: rsandifo-arm, c-rhodes, efriedma, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81721