Add clang builtins for the new tied wmma intrinsics.
These variations tie the destination
accumulator matrix to the input
accumulator matrix.
See https://github.com/llvm/llvm-project/pull/69903 for context.
This reverts commit e8fe4de64ffb84924c41e54116a04570046eed74.
memcpy/memmove instrumentation for -fsanitize=alignment has been tested
on a huge code base. There were some cleanups but the number does not
justify a workaround.
Break down the counted_by calculations so that they correctly handle
anonymous structs, which are specified internally as IndirectFieldDecls.
Improves the calculation of __bdos on a different field member in the struct.
And also improves support for __bdos in an index into the FAM. If the index
is further out than the length of the FAM, then we return __bdos's "can't
determine the size" value (zero or negative one, depending on type).
Also simplify the code to use helper methods to get the field referenced
by counted_by and the flexible array member itself, which also had some
issues with FAMs in sub-structs.
Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses
already available global variable "oclc_ABI_version" instead of
"llvm.amdgcn.abi.verion".
It also adds some minor fields in ImplicitArg structure.
amdgcn_update_dpp intrinsic (#71139)""
This reverts commit d1fb9307951319eea3e869d78470341d603c8363 and fixes
the lit test clang/test/CodeGenHIP/dpp-const-fold.hip
---------
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Operands of `__builtin_amdgcn_update_dpp` need to evaluate to constant
to match the intrinsic requirements.
Fixes: SWDEV-426822, SWDEV-431138
---------
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Now that `align_{up,down}` use `llvm.ptrmask` (as of #71238), the
assume doesn't preserve any information that is not still easily
re-computable.
Closes#71295
This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.
This patch adds reinterpret builtins as proposed here:
https://github.com/ARM-software/acle/pull/275.
The builtins take the form:
sv<dst>x<N>_t svreinterpret_<dst>_<src>_x<N>(sv<src>x<N>_t op)
where
- <src> and <dst> designate the source and the destination type,
respectively, all pairs chosen from {s8, u8, s16, u8, s32, u32, s64,
u64, bf16, f16, f32, f64}
- <N> designated the number of tuple elements, 2, 3 or 4
A short (overloaded) for is also provided, where the destination type is
explicitly designated and the source type is deduced from the parameter
type. These take the form
sv<dst>x<N>_t svreinterpret_<dst>(sv<src>x<N>_t op)
For example:
svuin16x2_t svreinterpret_u16_s32_x2(svint32x2_t op);
svuin16x2_t svreinterpret_u16(svint32x2_t op);
This patch removes duplicated code in EmitAArch64SVEBuiltinExpr and
EmitAArch64SMEBuiltinExpr by creating a new function called
GetAArch64SVEProcessedOperands which handles splitting up multi-vector
arguments using vector extracts.
These changes are non-functional.
This patch removes duplicated code in EmitAArch64SVEBuiltinExpr and
EmitAArch64SMEBuiltinExpr by creating a new function called
GetAArch64SVEProcessedOperands which handles splitting up multi-vector
arguments using vector extracts.
These changes are non-functional.
C language standard defined library functions `iszero`, `issignaling`
and `issubnormal`, which did not have counterparts among clang builtin
functions. This change adds new functions:
__builtin_iszero
__builtin_issubnormal
__builtin_issignaling
They provide builtin implementation for the missing standard functions.
Pull request: https://github.com/llvm/llvm-project/pull/69041
This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.
smmintrin.h uses __builtin_mffs, __builtin_mffsl, __builtin_mtfsf and
__builtin_set_fpscr_rn. This patch replaces the uses with ppc prefix
and implement the missing ones.
Deploying #67766 to a large internal codebase uncovers many bugs (many
are
probably benign but need cleaning up). There are also issues in
high-profile
open-source projects like v8. Add a cl::opt to disable builtin
instrumentation
for -fsanitize=alignment to help large codebase users.
In the long term, this cl::opt option may still be useful to debug
-fsanitize=alignment instrumentation on builtins, so we probably want to
keep it around.
The svldr_vnum_za and svstr_vnum_za builtins/intrinsics currently
require that the vnum argument be an immediate, but since vnum is used
to modify the base register via a mul and add, that restriction is not
necessary. This patch removes that restriction.
This patch adds the CodeGen changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. This change relaxes restrictions on what gets emitted on the device path, when compiling in `hipstdpar` mode:
1. Unless a function is explicitly marked `__host__`, it will get emitted, whereas before only `__device__` and `__global__` functions would be emitted;
2. Unsupported builtins are ignored as opposed to being marked as an error, as the decision on their validity is deferred to the `hipstdpar` specific code selection pass;
3. We add a `hipstdpar` specific pass to the opt pipeline, independent of optimisation level:
- When compiling for the host, iff the user requested it via the `--hipstdpar-interpose-alloc` flag, we add a pass which replaces canonical allocation / deallocation functions with accelerator aware equivalents.
A test to validate that unannotated functions get correctly emitted is added as well.
Reviewed by: yaxunl, efriedma
Differential Revision: https://reviews.llvm.org/D155850
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member in the same
structure holding the count of elements in the flexible array. This
information can be used to improve the results of the array bound
sanitizer and the '__builtin_dynamic_object_size' builtin.
This example specifies the that the flexible array member 'array' has
the number of elements allocated for it in 'count':
struct bar;
struct foo {
size_t count;
/* ... */
struct bar *array[] __attribute__((counted_by(count)));
};
This establishes a relationship between 'array' and 'count',
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained through changes to the structure.
In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:
struct foo *p;
void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}
The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:
struct foo *p;
void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}
void use_foo(int index) {
p->count += 42;
p->array[index] = 0; /* The sanitizer cannot properly check this access */
}
Reviewed By: nickdesaulniers, aaron.ballman
Differential Revision: https://reviews.llvm.org/D148381
The -fsanitize=alignment implementation follows the model that we allow
forming unaligned pointers but disallow accessing unaligned pointers.
See [RFC: Enforcing pointer type alignment in Clang](https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html)
for detail.
memcpy is a memory access and we require an `int *` argument to be aligned.
Similar to https://reviews.llvm.org/D9673 , emit -fsanitize=alignment check for
arguments of builtin memcpy and memmove functions to catch misaligned load like:
```
// Check the alignment of a but ignore the alignment of b
void unaligned_load(int *a, void *b) { memcpy(a, b, sizeof(*a)); }
```
For a reference parameter, we emit a -fsanitize=alignment check as well, which
can be optimized out by InstCombinePass. We rely on the call site
`TCK_ReferenceBinding` check instead.
```
// The alignment check of a will be optimized out.
void unaligned_load(int &a, void *b) { memcpy(&a, b, sizeof(a)); }
```
The diagnostic message looks like
```
runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *'
```
We could use a better message for memcpy, but we don't do it for now as it would
require a new check name like misaligned-pointer-use, which is probably not
necessary. *RFC: Enforcing pointer type alignment in Clang* is not well documented,
but this patch does not intend to change the that.
Technically builtin memset functions can be checked for -fsanitize=alignment as
well, but it does not seem too useful.
This reverts commit 9a954c693573281407f6ee3f4eb1b16cc545033d, which
causes clang crashes when compiling with `-fsanitize=bounds`. See
9a954c6935 (commitcomment-129529574)
for details.
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member in the same
structure holding the count of elements in the flexible array. This
information can be used to improve the results of the array bound sanitizer
and the '__builtin_dynamic_object_size' builtin.
This example specifies the that the flexible array member 'array' has the
number of elements allocated for it in 'count':
struct bar;
struct foo {
size_t count;
/* ... */
struct bar *array[] __attribute__((counted_by(count)));
};
This establishes a relationship between 'array' and 'count', specifically
that 'p->array' must have *at least* 'p->count' number of elements available.
It's the user's responsibility to ensure that this relationship is maintained
through changes to the structure.
In the following, the allocated array erroneously has fewer elements than
what's specified by 'p->count'. This would result in an out-of-bounds access not
not being detected:
struct foo *p;
void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}
The next example updates 'p->count', breaking the relationship requirement that
'p->array' must have at least 'p->count' number of elements available:
struct foo *p;
void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}
void use_foo(int index) {
p->count += 42;
p->array[index] = 0; /* The sanitizer cannot properly check this access */
}
Reviewed By: nickdesaulniers, aaron.ballman
Differential Revision: https://reviews.llvm.org/D148381
The patch fixes Function Multi Versioning features detection by ifunc
resolver on Android API levels < 30.
Ifunc hwcaps parameters are not supported on Android API levels 23-29,
so all CPU features are set unsupported if they were not initialized
before ifunc resolver call.
There is no support for ifunc on Android API levels < 23, so Function
Multi Versioning is disabled in this case.
Also use two underscore prefix for FMV runtime support functions to
avoid conflict with user program ones.
Differential Revision: https://reviews.llvm.org/D158641
- Update CodeGenTypeCache to use a single union for all pointers in
address space zero.
- Introduce a UnqualPtrTy in CodeGenTypeCache, and use that (for
example instead of llvm::PointerType::getUnqual) in some places.
- Drop some redundant bit/pointers casts from ptr to ptr.
Add __builtin_bcopy to the list of GNU builtins. This was causing a
series of test failures in glibc.
Adjust the tests to reflect the changes in codegen.
Fixes#51409.
Fixes#63065.
Implement the _Count* and _Copy* Windows ARM intrinsics:
```
double _CopyDoubleFromInt64(__int64)
float _CopyFloatFromInt32(__int32)
__int32 _CopyInt32FromFloat(float)
__int64 _CopyInt64FromDouble(double)
unsigned int _CountLeadingOnes(unsigned long)
unsigned int _CountLeadingOnes64(unsigned __int64)
unsigned int _CountLeadingSigns(long)
unsigned int _CountLeadingSigns64(__int64)
unsigned int _CountLeadingZeros(unsigned long)
unsigned int _CountLeadingZeros64(unsigned __int64)
unsigned int _CountOneBits(unsigned long)
unsigned int _CountOneBits64(unsigned __int64)
```
Full list of intrinsics here:
[https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics](https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics)
Bug: [65405](https://github.com/llvm/llvm-project/issues/65405)
Update handling of math errno. This change updates the logic for
generation of math intrinics in place of math library function calls.
The previous logic https://reviews.llvm.org/D151834 was incorrectly
using intrinsics when math errno handling was needed at optimization
levels above -O0.
This also fixes issue mentioned in https://reviews.llvm.org/D151834 by
@uabelho
This is joint work with @andykaylor Andy.
The new ACLE PR#225[1] now combines the slice parameters for some
builtins.
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset
[1]https://github.com/ARM-software/acle/pull/225/files
Summary:
We use the `llvm.amgcn.abi.version` varaible to control code generation.
This is emitted in every module now to indicate what should be used when
compiling. Previously, the logic caused us to emit an external reference
to this variable when creating the code for the `none` type. This would
then cause us not to emit the actual definition. This patch refines the
logic to create the external reference, and then update it if it is
found unset by the time we emit the global. I had to remove the
reference to `GetOrCreateLLVmGlobal` because it did not accept the
proper address space.
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. This patch is the #2 of 3 patches to update the interface.
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset
[1]https://github.com/ARM-software/acle/pull/225/files
Update DeviceRTL and the AMDGPU plugin to support code
object version 5. Default is code object version 4.
CodeGen for __builtin_amdgpu_workgroup_size generates code
for cov4 as well as cov5 if -mcode-object-version=none
is specified. DeviceRTL compilation passes this argument
via Xclang option to generate abi-agnostic code.
Generated code for the above builtin uses a clang
control constant "llvm.amdgcn.abi.version" to branch on
the abi version, which is available during linking of
user's OpenMP code. Load of this constant gets eliminated
during linking.
AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.
Differential Revision: https://reviews.llvm.org/D139730
Reviewed By: jhuber6, yaxunl
GCC 12 (https://gcc.gnu.org/PR101696) allows
__builtin_cpu_supports("x86-64") (and -v2 -v3 -v4).
This patch ports the feature.
* Add `FEATURE_X86_64_{BASELINE,V2,V3,V4}` to enum ProcessorFeatures,
but keep CPU_FEATURE_MAX unchanged to make
FeatureInfos/FeatureInfos_WithPLUS happy.
* Change validateCpuSupports to allow `x86-64{,-v2,-v3,-v4}`
* Change getCpuSupportsMask to return `std::array<uint32_t, 4>` where
`x86-64{,-v2,-v3,-v4}` set bits `FEATURE_X86_64_{BASELINE,V2,V3,V4}`.
* `target("x86-64")` and `cpu_dispatch(x86_64)` are invalid. Tested by commit 9de3b35ac9159d5bae6e6796cb91e4f877a07189
Close https://github.com/llvm/llvm-project/issues/59961
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D158811
The fp options specified through pragma are already encoded in Expr.
This patch takes the same approach used by clang codegen to emit
fastmath flags for fadd insts, basically use RAII to set the
current fastmath flags in IRBuilder, which is then used to emit
sqrt intrinsic.
Fixes: https://github.com/llvm/llvm-project/issues/64653