The patch fixes Function Multi Versioning features detection by ifunc
resolver on Android API levels < 30.
Ifunc hwcaps parameters are not supported on Android API levels 23-29,
so all CPU features are set unsupported if they were not initialized
before ifunc resolver call.
There is no support for ifunc on Android API levels < 23, so Function
Multi Versioning is disabled in this case.
Also use two underscore prefix for FMV runtime support functions to
avoid conflict with user program ones.
Differential Revision: https://reviews.llvm.org/D158641
- Update CodeGenTypeCache to use a single union for all pointers in
address space zero.
- Introduce a UnqualPtrTy in CodeGenTypeCache, and use that (for
example instead of llvm::PointerType::getUnqual) in some places.
- Drop some redundant bit/pointers casts from ptr to ptr.
Add __builtin_bcopy to the list of GNU builtins. This was causing a
series of test failures in glibc.
Adjust the tests to reflect the changes in codegen.
Fixes#51409.
Fixes#63065.
Implement the _Count* and _Copy* Windows ARM intrinsics:
```
double _CopyDoubleFromInt64(__int64)
float _CopyFloatFromInt32(__int32)
__int32 _CopyInt32FromFloat(float)
__int64 _CopyInt64FromDouble(double)
unsigned int _CountLeadingOnes(unsigned long)
unsigned int _CountLeadingOnes64(unsigned __int64)
unsigned int _CountLeadingSigns(long)
unsigned int _CountLeadingSigns64(__int64)
unsigned int _CountLeadingZeros(unsigned long)
unsigned int _CountLeadingZeros64(unsigned __int64)
unsigned int _CountOneBits(unsigned long)
unsigned int _CountOneBits64(unsigned __int64)
```
Full list of intrinsics here:
[https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics](https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics)
Bug: [65405](https://github.com/llvm/llvm-project/issues/65405)
Update handling of math errno. This change updates the logic for
generation of math intrinics in place of math library function calls.
The previous logic https://reviews.llvm.org/D151834 was incorrectly
using intrinsics when math errno handling was needed at optimization
levels above -O0.
This also fixes issue mentioned in https://reviews.llvm.org/D151834 by
@uabelho
This is joint work with @andykaylor Andy.
The new ACLE PR#225[1] now combines the slice parameters for some
builtins.
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset
[1]https://github.com/ARM-software/acle/pull/225/files
Summary:
We use the `llvm.amgcn.abi.version` varaible to control code generation.
This is emitted in every module now to indicate what should be used when
compiling. Previously, the logic caused us to emit an external reference
to this variable when creating the code for the `none` type. This would
then cause us not to emit the actual definition. This patch refines the
logic to create the external reference, and then update it if it is
found unset by the time we emit the global. I had to remove the
reference to `GetOrCreateLLVmGlobal` because it did not accept the
proper address space.
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. This patch is the #2 of 3 patches to update the interface.
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset
[1]https://github.com/ARM-software/acle/pull/225/files
Update DeviceRTL and the AMDGPU plugin to support code
object version 5. Default is code object version 4.
CodeGen for __builtin_amdgpu_workgroup_size generates code
for cov4 as well as cov5 if -mcode-object-version=none
is specified. DeviceRTL compilation passes this argument
via Xclang option to generate abi-agnostic code.
Generated code for the above builtin uses a clang
control constant "llvm.amdgcn.abi.version" to branch on
the abi version, which is available during linking of
user's OpenMP code. Load of this constant gets eliminated
during linking.
AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.
Differential Revision: https://reviews.llvm.org/D139730
Reviewed By: jhuber6, yaxunl
GCC 12 (https://gcc.gnu.org/PR101696) allows
__builtin_cpu_supports("x86-64") (and -v2 -v3 -v4).
This patch ports the feature.
* Add `FEATURE_X86_64_{BASELINE,V2,V3,V4}` to enum ProcessorFeatures,
but keep CPU_FEATURE_MAX unchanged to make
FeatureInfos/FeatureInfos_WithPLUS happy.
* Change validateCpuSupports to allow `x86-64{,-v2,-v3,-v4}`
* Change getCpuSupportsMask to return `std::array<uint32_t, 4>` where
`x86-64{,-v2,-v3,-v4}` set bits `FEATURE_X86_64_{BASELINE,V2,V3,V4}`.
* `target("x86-64")` and `cpu_dispatch(x86_64)` are invalid. Tested by commit 9de3b35ac9159d5bae6e6796cb91e4f877a07189
Close https://github.com/llvm/llvm-project/issues/59961
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D158811
The fp options specified through pragma are already encoded in Expr.
This patch takes the same approach used by clang codegen to emit
fastmath flags for fadd insts, basically use RAII to set the
current fastmath flags in IRBuilder, which is then used to emit
sqrt intrinsic.
Fixes: https://github.com/llvm/llvm-project/issues/64653
GCC 12 (https://gcc.gnu.org/PR101696) allows `arch=x86-64`
`arch=x86-64-v2` `arch=x86-64-v3` `arch=x86-64-v4` in the
target_clones function attribute. This patch ports the feature.
* Set KeyFeature to `x86-64{,-v2,-v3,-v4}` in `Processors[]`, to be used
by X86TargetInfo::multiVersionSortPriority
* builtins: change `__cpu_features2` to an array like libgcc. Define
`FEATURE_X86_64_{BASELINE,V2,V3,V4}` and depended ISA feature bits.
* CGBuiltin.cpp: update EmitX86CpuSupports to handle `arch=x86-64*`.
Close https://github.com/llvm/llvm-project/issues/55830
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D158329
Static Analyzer Tool complains about a large function call parameter which is is passed by value in CGBuiltin.cpp file.
1. In CodeGenFunction::EmitSMELdrStr(clang::SVETypeFlags, llvm::SmallVectorImpl<llvm::Value *> &, unsigned int): We are passing parameter TypeFlags of type clang::SVETypeFlags by value.
2. In CodeGenFunction::EmitSMEZero(clang::SVETypeFlags, llvm::SmallVectorImpl<llvm::Value *> &, unsigned int): We are passing parameter TypeFlags of type clang::SVETypeFlags by value.
3. In CodeGenFunction::EmitSMEReadWrite(clang::SVETypeFlags, llvm::SmallVectorImpl<llvm::Value *> &, unsigned int): We are passing parameter TypeFlags of type clang::SVETypeFlags by value.
4. In CodeGenFunction::EmitSMELd1St1(clang::SVETypeFlags, llvm::SmallVectorImpl<llvm::Value *> &, unsigned int): We are passing parameter TypeFlags of type clang::SVETypeFlags by value.
I see many places in CGBuiltin.cpp file, we are passing parameter TypeFlags of type clang::SVETypeFlags by reference.
clang::SVETypeFlags inherits several other types.
This patch passes parameter TypeFlags by reference instead of by value in the function.
Reviewed By: tahonermann, sdesmalen
Differential Revision: https://reviews.llvm.org/D158522
Currenly both Clang and GCC support the following set of flags that control
code gen of signed overflow:
* -fwrapv: overflow is defined as in two-complement
* -ftrapv: overflow traps
* -fsanitize=signed-integer-overflow: if undefined (no -fwrapv), then overflow
behaviour is controlled by UBSan runtime, overrides -ftrapv
Howerver, clang ignores these flags for __builtin_abs(int) and its higher-width
versions, so passing minimum integer value always causes poison.
The same holds for *abs(), which are not handled in frontend at all but folded
to llvm.abs.* intrinsics during InstCombinePass. The intrinsics are not
instrumented by UBSan, so the functions need special handling as well.
This patch does a few things:
* Handle *abs() in CGBuiltin the same way as __builtin_*abs()
* -fsanitize=signed-integer-overflow now properly instruments abs() with UBSan
* -fwrapv and -ftrapv handling for abs() is made consistent with GCC
Fixes#45129 and #45794
Reviewed By: efriedma, MaskRay
Differential Revision: https://reviews.llvm.org/D156821
This reverts commit 1783185790de29b24d3850d33d9a9d586e6bbd39,
which broke the buildbots, starting with when it was first built in https://lab.llvm.org/buildbot/#/builders/85/builds/18390
(N.B. I think the patch is uncovering real bugs; the revert
is simply to keep the tree green and the buildbots useful, because I'm not confident how to
fix-forward all the found bugs.)
Currenly both Clang and GCC support the following set of flags that
control code gen of signed overflow:
* -fwrapv: overflow is defined as in two-complement
* -ftrapv: overflow traps
* -fsanitize=signed-integer-overflow: if undefined (no -fwrapv), then
overflow behaviour is controlled by UBSan runtime, overrides -ftrapv.
However, clang ignores these flags for __builtin_abs(int) and its
higher-width versions, so passing minimum integer value always causes
poison.
The same holds for *abs(), which are not handled in frontend at all but
folded to llvm.abs.* intrinsics during InstCombinePass. The intrinsics
are not instrumented by UBSan, so the functions need special handling
as well.
This patch does a few things:
* Handle *abs() in CGBuiltin the same way as __builtin_*abs()
* -fsanitize=signed-integer-overflow now properly instruments abs() with
UBSan
* -fwrapv and -ftrapv handling for abs() is made consistent with GCC
Fixes https://github.com/llvm/llvm-project/issues/45129
Fixes https://github.com/llvm/llvm-project/issues/45794
Differential Revision: https://reviews.llvm.org/D156821
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.
https://reviews.llvm.org/D156666
Fixed the type modifier (L->W), removed redundant feature checking code
since the feature has already been checked in `EmitBuiltinExpr`. And
Cleaned up unused diagnostic information.
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D156866
Since we no longer support typed LLVM IR pointer types, the code can
be simplified into for example using PointerType::get directly instead
of using Type::getInt8PtrTy and Type::getInt32PtrTy etc.
Differential Revision: https://reviews.llvm.org/D156733
`alloca` instructions always return pointers to the `alloca` address space. This composes poorly with most HLLs which are address space agnostic and thus have all pointers point to generic/default. Static `alloca`s were already handled on the AST level, however dynamic `alloca`s were not, which would lead to subtly incorrect IR. This patch addresses that by inserting an address space cast iff the `alloca` address space is different from the default / expected.
Reviewed By: rjmccall, arsenm
Differential Revision: https://reviews.llvm.org/D156539
Add codegen for llvm bitreverse elementwise builtin
The bitreverse elementwise builtin is necessary for HLSL codegen.
Tests were added to make sure that the expected errors are encountered when these functions are given inputs of incompatible types, or too many inputs.
The new builtin is restricted to integer types only.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D156357
Add codegen for llvm pow elementwise builtin
The pow elementwise builtin is necessary for HLSL codegen.
Tests were added to make sure that the expected errors are encountered when these functions are given inputs of incompatible types, or too many inputs.
The new builtin is restricted to floating point types only.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153310
The specification for LDR/STR says that:
The ZA array vector is selected by the sum of the vector select register
and immediate offset, modulo the number of bytes in a Streaming SVE
vector. [..] This instruction does not require the PE to be in Streaming
SVE mode
When the instruction is used outside of streaming mode, 'vscale' will result
in the wrong value being used for the offset because LLVM's code-generator
will emit the non-streaming 'RDVL/ADDVL' instead of the 'RDSVL/ADDSVL'
instructions which are used to get the Streaming-SVE vector length.
Reviewed By: bryanpkc
Differential Revision: https://reviews.llvm.org/D156121
This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):
- svaddha_za32[_u32]_m // also for s32
- svaddva_za32[_u32]_m // also for s32
- svaddha_za64[_u64]_m // also for s64
- svaddva_za64[_u64]_m // also for s64
The _za64 versions are available only when the sme-i16i64 feature is enabled.
Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D134680
This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):
- svread_hor_za8[_s8]_m // also for u8
- svread_hor_za16[_s16]_m // also for u16, f16, bf16
- svread_hor_za32[_s32]_m // also for u32, f32
- svread_hor_za64[_s64]_m // also for u64, f64
- svread_hor_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
- svread_ver_za8[_s8]_m // also for u8
- svread_ver_za16[_s16]_m // also for u16, f16, bf16
- svread_ver_za32[_s32]_m // also for u32, f32
- svread_ver_za64[_s64]_m // also for u64, f64
- svread_ver_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
- svwrite_hor_za8[_s8]_m // also for u8
- svwrite_hor_za16[_s16]_m // also for u16, f16, bf16
- svwrite_hor_za32[_s32]_m // also for u32, f32
- svwrite_hor_za64[_s64]_m // also for u64, f64
- svwrite_hor_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
- svwrite_ver_za8[_s8]_m // also for u8
- svwrite_ver_za16[_s16]_m // also for u16, f16, bf16
- svwrite_ver_za32[_s32]_m // also for u32, f32
- svwrite_ver_za64[_s64]_m // also for u64, f64
- svwrite_ver_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>
Reviewed By: sdesmalen, kmclaughlin
Differential Revision: https://reviews.llvm.org/D128648
Previously we returned i32 on RV32 and i64 on RV64. The instructions
only consume 32 bits and only produce 32 bits. For RV64, the result
is sign extended to 64 bits like *W instructions.
This patch removes this detail from the interface to improve
portability and consistency. This matches the proposal for scalar
intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44
I've included IR autoupgrade support as well.
I'll be doing this for other builtins/intrinsics that currently use
'long' in other patches.
Reviewed By: VincentWu
Differential Revision: https://reviews.llvm.org/D154647
This removes another use of 'long' to mean xlen from builtins.
I've also converted the types to unsigned as proposed in D154616.
clmul_32 is available to RV64 as its emulation is clmul+sext.w
clmulh_32 and clmulr_32 are not available on RV64 as their emulation
is currently 6 instructions in the worst case.
OpenCL and HIP have -cl-fp32-correctly-rounded-divide-sqrt and
-fno-hip-correctly-rounded-divide-sqrt. The corresponding fpmath metadata
was only set on fdiv, and not sqrt. The backend is currently underutilizing
sqrt lowering options, and the responsibility is split between the libraries
and backend and this metadata is needed.
CUDA/NVCC has -prec-div and -prev-sqrt but clang doesn't appear to be
aiming for compatibility with those. Don't know if OpenMP has a similar
control.
D152023 made ubsan consider __builtin_clz of 0 undefined regardless of
the target. This ensures portability and matches gcc.
This causes the ACLE intrinsics to also be considered to also be
considered to be undefined for 0 since they used the generic builtins
as their implementation.
This patch adds builtins for ARM that ubsan doesn't know about to make
the behavior defined for 0. Alternatively, I could have added a zero
check to the intrinsics, but the dedicated builtin will give better -O0
codegen.
Fixes#63113.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D154915
Builtin floating-point number classification functions:
- __builtin_isnan,
- __builtin_isinf,
- __builtin_finite, and
- __builtin_isnormal
now are implemented using `llvm.is_fpclass`.
This change makes the target callback `TargetCodeGenInfo::testFPKind`
unneeded. It is preserved in this change and should be removed later.
Differential Revision: https://reviews.llvm.org/D112932
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
This adds new intrisics to support the LDAP1 and STL1 Advanced SIMD
(Neon) instructions introduced as part of FEAT_LRCPC3.
The new intrinsics `vldap1(q)_lane`/`vstl1(q)_lane` generate IR code
similar to the existing `vld1(q)_lane/st1(q)_lane` ones, but capturing
the difference in the atomic release/acquire memory model.
The LLVM code generation changes to ensure that this instruction pair
is lowered to the correct LDAP1/STL1 instructions will be covered in a
separate commit.
Based on a patch by Sam Elliott.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D153128