1787 Commits

Author SHA1 Message Date
Pavel Iliin
8ec50d6446 [AArch64] Fix FMV ifunc resolver usage on old Android APIs. Rename internal compiler-rt FMV functions.
The patch fixes Function Multi Versioning features detection by ifunc
resolver on Android API levels < 30.
Ifunc hwcaps parameters are not supported on Android API levels 23-29,
so all CPU features are set unsupported if they were not initialized
before ifunc resolver call.
There is no support for ifunc on Android API levels < 23, so Function
Multi Versioning is disabled in this case.

Also use two underscore prefix for FMV runtime support functions to
avoid conflict with user program ones.

Differential Revision: https://reviews.llvm.org/D158641
2023-09-29 17:10:48 +01:00
Fangrui Song
0d8b864829 CGBuiltin: emit llvm.abs.* instead of neg+icmp+select for abs
instcombine will combine neg+icmp+select to llvm.abs.*. Let's just emit
llvm.abs.* in the first place.
2023-09-27 21:29:56 -07:00
Phoebe Wang
31631d307f
[X86][FP16] Add missing handling for FP16 constrained cmp intrinsics (#67400) 2023-09-26 19:27:57 +08:00
Björn Pettersson
b4858c634e
[clang][CodeGen] Simplify code based on opaque pointers (#65624)
- Update CodeGenTypeCache to use a single union for all pointers in
  address space zero.
- Introduce a UnqualPtrTy in CodeGenTypeCache, and use that (for
  example instead of llvm::PointerType::getUnqual) in some places.
- Drop some redundant bit/pointers casts from ptr to ptr.
2023-09-25 11:21:24 +02:00
Carlos Eduardo Seo
7523550853
[Clang][CodeGen] Add __builtin_bcopy (#67130)
Add __builtin_bcopy to the list of GNU builtins. This was causing a
series of test failures in glibc.

Adjust the tests to reflect the changes in codegen.

Fixes #51409.
Fixes #63065.
2023-09-24 11:58:14 -03:00
Amy Huang
03c698a431
[MSVC, ARM64] Add _Copy* and _Count* intrinsics (#66554)
Implement the _Count* and _Copy* Windows ARM intrinsics:

```
double _CopyDoubleFromInt64(__int64)
float _CopyFloatFromInt32(__int32)
__int32 _CopyInt32FromFloat(float)
__int64 _CopyInt64FromDouble(double)
unsigned int _CountLeadingOnes(unsigned long)
unsigned int _CountLeadingOnes64(unsigned __int64)
unsigned int _CountLeadingSigns(long)
unsigned int _CountLeadingSigns64(__int64)
unsigned int _CountLeadingZeros(unsigned long)
unsigned int _CountLeadingZeros64(unsigned __int64)
unsigned int _CountOneBits(unsigned long)
unsigned int _CountOneBits64(unsigned __int64)
```

Full list of intrinsics here:
[https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics](https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics)

Bug: [65405](https://github.com/llvm/llvm-project/issues/65405)
2023-09-21 14:34:59 -07:00
Zahira Ammarguellat
a292e7edf8
Fix math-errno issue (#66381)
Update handling of math errno. This change updates the logic for
generation of math intrinics in place of math library function calls.
The previous logic https://reviews.llvm.org/D151834 was incorrectly
using intrinsics when math errno handling was needed at optimization
levels above -O0.
This also fixes issue mentioned in https://reviews.llvm.org/D151834 by
@uabelho
This is joint work with @andykaylor Andy.
2023-09-19 09:13:02 -04:00
Matt Arsenault
ddc3346a6b
clang/AMDGPU: Fix accidental behavior change for __builtin_amdgcn_ldexph (#66340) 2023-09-14 18:15:44 +03:00
CarolineConcatto
ee31ba0dd9
[AArch64][SME]Update intrinsic interface for ld1/st1 (#65582)
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. 
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset

[1]https://github.com/ARM-software/acle/pull/225/files
2023-09-13 15:24:09 +01:00
Joseph Huber
49ff6a96a7
[Clang] Define AMDGPU ABI when referenced in CodeGen for ABI "none" (#66162)
Summary:
We use the `llvm.amgcn.abi.version` varaible to control code generation.
This is emitted in every module now to indicate what should be used when
compiling. Previously, the logic caused us to emit an external reference
to this variable when creating the code for the `none` type. This would
then cause us not to emit the actual definition. This patch refines the
logic to create the external reference, and then update it if it is
found unset by the time we emit the global. I had to remove the
reference to `GetOrCreateLLVmGlobal` because it did not accept the
proper address space.
2023-09-13 08:31:31 -05:00
CarolineConcatto
dc8d2ecc5e
[AArch64][SME]Update intrinsic interface for read/write (#65594)
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. This patch is the #2 of 3 patches to update the interface.

Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset

[1]https://github.com/ARM-software/acle/pull/225/files
2023-09-12 18:08:57 +01:00
CarolineConcatto
7b8d4eff02
[AArch64][SME]Update intrinsic interface for ldr/str (#65593)
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. 

[1]https://github.com/ARM-software/acle/pull/225/files
2023-09-12 17:31:51 +01:00
Max Iyengar
dbeb3d029d Add missing vrnd intrinsics
This patch adds 8 missing intrinsics as specified in the Arm ACLE document section 2.12.1.1 : [[ https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#rounding-3 | https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#rounding-3]]

The intrinsics implemented are:

  - vrnd32z_f64
  - vrnd32zq_f64
  - vrnd64z_f64
  - vrnd64zq_f64
  - vrnd32x_f64
  - vrnd32xq_f64
  - vrnd64x_f64
  - vrnd64xq_f64

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D158626
2023-09-11 12:59:18 +01:00
Matt Arsenault
6a08cf12d9 clang: Add __builtin_exp10* and use new llvm.exp10 intrinsic
https://reviews.llvm.org/D157911
2023-09-09 23:14:12 +03:00
Zahira Ammarguellat
2c93e3c1c8 Take math-errno into account with '#pragma float_control(precise,on)' and
'attribute__((optnone)).

Differential Revision: https://reviews.llvm.org/D151834
2023-09-08 09:48:53 -04:00
Saiyedul Islam
f616c3eeb4
[OpenMP][DeviceRTL][AMDGPU] Support code object version 5
Update DeviceRTL and the AMDGPU plugin to support code
object version 5. Default is code object version 4.

CodeGen for __builtin_amdgpu_workgroup_size generates code
for cov4 as well as cov5 if -mcode-object-version=none
is specified. DeviceRTL compilation passes this argument
via Xclang option to generate abi-agnostic code.

Generated code for the above builtin uses a clang
control constant "llvm.amdgcn.abi.version" to branch on
the abi version, which is available during linking of
user's OpenMP code. Load of this constant gets eliminated
during linking.

AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.

Differential Revision: https://reviews.llvm.org/D139730

Reviewed By: jhuber6, yaxunl
2023-08-29 06:35:44 -05:00
Kazu Hirata
722474969e [clang] Use SmallDenseMap::contains (NFC) 2023-08-27 08:26:50 -07:00
Fangrui Song
27da15381c [X86] __builtin_cpu_supports: support x86-64{,-v2,-v3,-v4}
GCC 12 (https://gcc.gnu.org/PR101696) allows
__builtin_cpu_supports("x86-64") (and -v2 -v3 -v4).
This patch ports the feature.

* Add `FEATURE_X86_64_{BASELINE,V2,V3,V4}` to enum ProcessorFeatures,
  but keep CPU_FEATURE_MAX unchanged to make
  FeatureInfos/FeatureInfos_WithPLUS happy.
* Change validateCpuSupports to allow `x86-64{,-v2,-v3,-v4}`
* Change getCpuSupportsMask to return `std::array<uint32_t, 4>` where
  `x86-64{,-v2,-v3,-v4}` set bits `FEATURE_X86_64_{BASELINE,V2,V3,V4}`.
* `target("x86-64")` and `cpu_dispatch(x86_64)` are invalid. Tested by commit 9de3b35ac9159d5bae6e6796cb91e4f877a07189

Close https://github.com/llvm/llvm-project/issues/59961

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D158811
2023-08-25 20:56:25 -07:00
Yaxun (Sam) Liu
63f0833ec4 [clang] Fix missing contract flag in sqrt intrinsic
The fp options specified through pragma are already encoded in Expr.

This patch takes the same approach used by clang codegen to emit
fastmath flags for fadd insts, basically use RAII to set the
current fastmath flags in IRBuilder, which is then used to emit
sqrt intrinsic.

Fixes: https://github.com/llvm/llvm-project/issues/64653
2023-08-24 19:15:17 -04:00
Fangrui Song
7a41af8604 [X86] Support arch=x86-64{,-v2,-v3,-v4} for target_clones attribute
GCC 12 (https://gcc.gnu.org/PR101696) allows `arch=x86-64`
`arch=x86-64-v2` `arch=x86-64-v3` `arch=x86-64-v4` in the
target_clones function attribute. This patch ports the feature.

* Set KeyFeature to `x86-64{,-v2,-v3,-v4}` in `Processors[]`, to be used
  by X86TargetInfo::multiVersionSortPriority
* builtins: change `__cpu_features2` to an array like libgcc. Define
  `FEATURE_X86_64_{BASELINE,V2,V3,V4}` and depended ISA feature bits.
* CGBuiltin.cpp: update EmitX86CpuSupports to handle `arch=x86-64*`.

Close https://github.com/llvm/llvm-project/issues/55830

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D158329
2023-08-23 22:08:55 -07:00
Manna, Soumi
30c60ec52f [NFC][CLANG] Fix static analyzer bugs about large copy by values
Static Analyzer Tool complains about a large function call parameter which is is passed by value in CGBuiltin.cpp file.

1. In CodeGenFunction::EmitSMELdrStr(clang::SVETypeFlags, llvm::SmallVectorImpl<llvm::Value *> &, unsigned int): We are passing parameter TypeFlags of type clang::SVETypeFlags by value.

2. In CodeGenFunction::EmitSMEZero(clang::SVETypeFlags, llvm::SmallVectorImpl<llvm::Value *> &, unsigned int): We are passing parameter TypeFlags of type clang::SVETypeFlags by value.

3. In CodeGenFunction::EmitSMEReadWrite(clang::SVETypeFlags, llvm::SmallVectorImpl<llvm::Value *> &, unsigned int): We are passing parameter TypeFlags of type clang::SVETypeFlags by value.

4. In CodeGenFunction::EmitSMELd1St1(clang::SVETypeFlags, llvm::SmallVectorImpl<llvm::Value *> &, unsigned int): We are passing parameter TypeFlags of type clang::SVETypeFlags by value.

I see many places in CGBuiltin.cpp file, we are passing parameter TypeFlags of type clang::SVETypeFlags by reference.

clang::SVETypeFlags inherits several other types.

This patch passes parameter TypeFlags by reference instead of by value in the function.

Reviewed By: tahonermann, sdesmalen

Differential Revision: https://reviews.llvm.org/D158522
2023-08-23 07:57:04 -07:00
Artem Labazov
f0bbda00bd [CodeGen] [ubsan] Respect integer overflow handling in abs builtin
Currenly both Clang and GCC support the following set of flags that control
code gen of signed overflow:

* -fwrapv: overflow is defined as in two-complement
* -ftrapv: overflow traps
* -fsanitize=signed-integer-overflow: if undefined (no -fwrapv), then overflow
  behaviour is controlled by UBSan runtime, overrides -ftrapv

Howerver, clang ignores these flags for __builtin_abs(int) and its higher-width
versions, so passing minimum integer value always causes poison.

The same holds for *abs(), which are not handled in frontend at all but folded
to llvm.abs.* intrinsics during InstCombinePass. The intrinsics are not
instrumented by UBSan, so the functions need special handling as well.

This patch does a few things:

* Handle *abs() in CGBuiltin the same way as __builtin_*abs()
* -fsanitize=signed-integer-overflow now properly instruments abs() with UBSan
* -fwrapv and -ftrapv handling for abs() is made consistent with GCC

Fixes #45129 and #45794

Reviewed By: efriedma, MaskRay

Differential Revision: https://reviews.llvm.org/D156821
2023-08-21 09:13:25 -07:00
Thurston Dang
fc06cce30d Revert "Respect integer overflow handling in abs builtin"
This reverts commit 1783185790de29b24d3850d33d9a9d586e6bbd39,
which broke the buildbots, starting with when it was first built in https://lab.llvm.org/buildbot/#/builders/85/builds/18390

(N.B. I think the patch is uncovering real bugs; the revert
is simply to keep the tree green and the buildbots useful, because I'm not confident how to
fix-forward all the found bugs.)
2023-08-18 19:59:34 +00:00
Artem Labazov
1783185790 Respect integer overflow handling in abs builtin
Currenly both Clang and GCC support the following set of flags that
control code gen of signed overflow:

* -fwrapv: overflow is defined as in two-complement
* -ftrapv: overflow traps
* -fsanitize=signed-integer-overflow: if undefined (no -fwrapv), then
overflow behaviour is controlled by UBSan runtime, overrides -ftrapv.

However, clang ignores these flags for __builtin_abs(int) and its
higher-width versions, so passing minimum integer value always causes
poison.

The same holds for *abs(), which are not handled in frontend at all but
folded to llvm.abs.* intrinsics during InstCombinePass. The intrinsics
are not instrumented by UBSan, so the functions need special handling
as well.

This patch does a few things:

* Handle *abs() in CGBuiltin the same way as __builtin_*abs()
* -fsanitize=signed-integer-overflow now properly instruments abs() with
UBSan
* -fwrapv and -ftrapv handling for abs() is made consistent with GCC

Fixes https://github.com/llvm/llvm-project/issues/45129
Fixes https://github.com/llvm/llvm-project/issues/45794

Differential Revision: https://reviews.llvm.org/D156821
2023-08-17 14:36:00 -04:00
Matt Arsenault
9e3d9c9eae clang: Add __builtin_elementwise_sqrt
This will be used in the opencl builtin headers to provide direct
intrinsic access with proper !fpmath metadata.

https://reviews.llvm.org/D156737
2023-08-11 19:32:39 -04:00
Bjorn Pettersson
d03f4177df [clang] Drop some references to typed pointers (getInt8PtrTy). NFC
Differential Revision: https://reviews.llvm.org/D157550
2023-08-10 15:07:06 +02:00
Matt Arsenault
25bc999d1f Intrinsics: Add type overload to stacksave and stackstore
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.

https://reviews.llvm.org/D156666
2023-08-09 18:33:11 -04:00
wanglei
ea8d3b1f9f [Clang][LoongArch] Use the ClangBuiltin class to automatically generate support for CBE and CFE
Fixed the type modifier (L->W), removed redundant feature checking code
since the feature has already been checked in `EmitBuiltinExpr`. And
Cleaned up unused diagnostic information.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D156866
2023-08-09 16:04:09 +08:00
Piyou Chen
2df05cd01c [RISCV] Support overloaded version ntlh intrinsic function
Here is the proposal https://github.com/riscv-non-isa/riscv-c-api-doc/pull/47.

The version that omit the domain argument imply domain=__RISCV_NTLH_ALL.

```
type __riscv_ntl_load (type *ptr);
void __riscv_ntl_store (type *ptr, type val);
```

Reviewed By: kito-cheng, craig.topper

Differential Revision: https://reviews.llvm.org/D156221
2023-08-04 00:39:25 -07:00
Bjorn Pettersson
2bdc86484d [clang][CodeGen] Drop some typed pointer bitcasts
Differential Revision: https://reviews.llvm.org/D156911
2023-08-03 22:54:33 +02:00
Bjorn Pettersson
fd05c34b18 Stop using legacy helpers indicating typed pointer types. NFC
Since we no longer support typed LLVM IR pointer types, the code can
be simplified into for example using PointerType::get directly instead
of using Type::getInt8PtrTy and Type::getInt32PtrTy etc.

Differential Revision: https://reviews.llvm.org/D156733
2023-08-02 12:08:37 +02:00
Alex Voicu
51a014cb2d [Clang][CodeGen] __builtin_allocas should care about address spaces
`alloca` instructions always return pointers to the `alloca` address space. This composes poorly with most HLLs which are address space agnostic and thus have all pointers point to generic/default. Static `alloca`s were already handled on the AST level, however dynamic `alloca`s were not, which would lead to subtly incorrect IR. This patch addresses that by inserting an address space cast iff the `alloca` address space is different from the default / expected.

Reviewed By: rjmccall, arsenm

Differential Revision: https://reviews.llvm.org/D156539
2023-08-01 21:55:36 +01:00
Joshua Batista
57f879cdd4 clang: Add elementwise bitreverse builtin
Add codegen for llvm bitreverse elementwise builtin
The bitreverse elementwise builtin is necessary for HLSL codegen.
Tests were added to make sure that the expected errors are encountered when these functions are given inputs of incompatible types, or too many inputs.
The new builtin is restricted to integer types only.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156357
2023-07-31 10:59:13 -07:00
ranapratap55
970569b6cc [AMDGPU] __builtin_amdgcn_read_exec_* should be implemented with llvm.amdgcn.ballot
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156219
2023-07-26 16:21:31 +05:30
Joshua Batista
3a98e73169 clang: Add elementwise pow builtin
Add codegen for llvm pow elementwise builtin
The pow elementwise builtin is necessary for HLSL codegen.
Tests were added to make sure that the expected errors are encountered when these functions are given inputs of incompatible types, or too many inputs.
The new builtin is restricted to floating point types only.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153310
2023-07-24 14:03:58 -07:00
Sander de Smalen
a8cbd27d1f [Clang][AArch64] svldr_vnum/svstr_vnum should use cntsb iso vscale for the offset
The specification for LDR/STR says that:

  The ZA array vector is selected by the sum of the vector select register
  and immediate offset, modulo the number of bytes in a Streaming SVE
  vector. [..] This instruction does not require the PE to be in Streaming
  SVE mode

When the instruction is used outside of streaming mode, 'vscale' will result
in the wrong value being used for the offset because LLVM's code-generator
will emit the non-streaming 'RDVL/ADDVL' instead of the 'RDSVL/ADDSVL'
instructions which are used to get the Streaming-SVE vector length.

Reviewed By: bryanpkc

Differential Revision: https://reviews.llvm.org/D156121
2023-07-24 14:29:45 +00:00
Bryan Chan
4ae900c063 [Clang][AArch64][SME] Add intrinsics for adding vector elements to ZA tile
This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):

  - svaddha_za32[_u32]_m // also for s32
  - svaddva_za32[_u32]_m // also for s32
  - svaddha_za64[_u64]_m // also for s64
  - svaddva_za64[_u64]_m // also for s64

The _za64 versions are available only when the sme-i16i64 feature is enabled.

Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D134680
2023-07-20 06:06:36 -04:00
Bryan Chan
15d16a79a0 [Clang][AArch64][SME] Add intrinsics for reading streaming vector length
This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):

  - svcntsb
  - svcntsh
  - svcntsw
  - svcntsd

Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D134679
2023-07-20 06:06:35 -04:00
Bryan Chan
f225898a7c [Clang][AArch64][SME] Add intrinsics for ZA array load/store (LDR/STR)
This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):

  - svldr_vnum_za
  - svstr_vnum_za

Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D134678
2023-07-20 06:06:35 -04:00
Bryan Chan
578b0bd4e6 [Clang][AArch64][SME] Add ZA zeroing intrinsics
This patch adds support for the following SME ACLE intrinsics (as defined
 in https://arm-software.github.io/acle/main/acle.html):

   - svzero_mask_za
   - svzero_za

Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D134677
2023-07-20 06:06:34 -04:00
Bryan Chan
6dc94c54e5 [Clang][AArch64][SME] Add vector read/write (mova) intrinsics
This patch adds support for the following SME ACLE intrinsics (as defined
in https://arm-software.github.io/acle/main/acle.html):

  - svread_hor_za8[_s8]_m    // also for u8
  - svread_hor_za16[_s16]_m  // also for u16, f16, bf16
  - svread_hor_za32[_s32]_m  // also for u32, f32
  - svread_hor_za64[_s64]_m  // also for u64, f64
  - svread_hor_za128[_s8]_m  // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
  - svread_ver_za8[_s8]_m    // also for u8
  - svread_ver_za16[_s16]_m  // also for u16, f16, bf16
  - svread_ver_za32[_s32]_m  // also for u32, f32
  - svread_ver_za64[_s64]_m  // also for u64, f64
  - svread_ver_za128[_s8]_m  // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
  - svwrite_hor_za8[_s8]_m   // also for u8
  - svwrite_hor_za16[_s16]_m // also for u16, f16, bf16
  - svwrite_hor_za32[_s32]_m // also for u32, f32
  - svwrite_hor_za64[_s64]_m // also for u64, f64
  - svwrite_hor_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
  - svwrite_ver_za8[_s8]_m   // also for u8
  - svwrite_ver_za16[_s16]_m // also for u16, f16, bf16
  - svwrite_ver_za32[_s32]_m // also for u32, f32
  - svwrite_ver_za64[_s64]_m // also for u64, f64
  - svwrite_ver_za128[_s8]_m // also for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64

Co-authored-by: Sagar Kulkarni <sagar.kulkarni1@huawei.com>

Reviewed By: sdesmalen, kmclaughlin

Differential Revision: https://reviews.llvm.org/D128648
2023-07-20 06:06:33 -04:00
Craig Topper
a64b3e92c7 [RISCV] Re-define sha256, Zksed, and Zksh intrinsics to use i32 types.
Previously we returned i32 on RV32 and i64 on RV64. The instructions
only consume 32 bits and only produce 32 bits. For RV64, the result
is sign extended to 64 bits like *W instructions.

This patch removes this detail from the interface to improve
portability and consistency. This matches the proposal for scalar
intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44

I've included IR autoupgrade support as well.

I'll be doing this for other builtins/intrinsics that currently use
'long' in other patches.

Reviewed By: VincentWu

Differential Revision: https://reviews.llvm.org/D154647
2023-07-17 08:58:29 -07:00
Craig Topper
143e2c2ac0 [RISCV] Split clmul/clmulh/clmulr builtins into _32 and _64 versions.
This removes another use of 'long' to mean xlen from builtins.

I've also converted the types to unsigned as proposed in D154616.

clmul_32 is available to RV64 as its emulation is clmul+sext.w
clmulh_32 and clmulr_32 are not available on RV64 as their emulation
is currently 6 instructions in the worst case.
2023-07-14 19:09:15 -07:00
Matt Arsenault
bac2a07540 clang: Attach !fpmath metadata to __builtin_sqrt based on language flags
OpenCL and HIP have -cl-fp32-correctly-rounded-divide-sqrt and
-fno-hip-correctly-rounded-divide-sqrt. The corresponding fpmath metadata
was only set on fdiv, and not sqrt. The backend is currently underutilizing
sqrt lowering options, and the responsibility is split between the libraries
and backend and this metadata is needed.

CUDA/NVCC has -prec-div and -prev-sqrt but clang doesn't appear to be
aiming for compatibility with those. Don't know if OpenMP has a similar
control.
2023-07-14 18:46:18 -04:00
Craig Topper
85b27ace52 [ARM][AArch64] Add ARM specific builtin for clz that is not undefined for 0 in ubsan.
D152023 made ubsan consider __builtin_clz of 0 undefined regardless of
the target. This ensures portability and matches gcc.

This causes the ACLE intrinsics to also be considered to also be
considered to be undefined for 0 since they used the generic builtins
as their implementation.

This patch adds builtins for ARM that ubsan doesn't know about to make
the behavior defined for 0. Alternatively, I could have added a zero
check to the intrinsics, but the dedicated builtin will give better -O0
codegen.

Fixes #63113.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D154915
2023-07-12 09:29:18 -07:00
Serge Pavlov
7d6c2e1811 [clang] Use llvm.is_fpclass to implement FP classification functions
Builtin floating-point number classification functions:

    - __builtin_isnan,
    - __builtin_isinf,
    - __builtin_finite, and
    - __builtin_isnormal

now are implemented using `llvm.is_fpclass`.

This change makes the target callback `TargetCodeGenInfo::testFPKind`
unneeded. It is preserved in this change and should be removed later.

Differential Revision: https://reviews.llvm.org/D112932
2023-07-11 21:34:53 +07:00
Craig Topper
939f818a66 [RISCV] Split __builtin_riscv_brev8 into _32 and _64 builtin.
Allow _32 builtin on RV64 since it only brev8+sext.w.

Part of an effort to remove 'long' to mean XLen from the builtin
interface.

Matches the proposal here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D154683
2023-07-10 13:01:07 -07:00
Craig Topper
a1b7db3e4c [RISCV] Split __builtin_riscv_xperm4/8 into separate _32 and _64 builtins.
Part of an effort to remove uses of 'long' to mean XLen from the builtin
interfaces.

Also makes the builtin names match https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D154681
2023-07-10 12:18:20 -07:00
Sergio Afonso
63ca93c7d1
[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.

`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.

Differential Revision: https://reviews.llvm.org/D154591
2023-07-10 14:14:16 +01:00
Lucas Prates
2b7ac62606 [AArch64][RCPC3] Add Neon intrinsics for LDAP1 and STL1
This adds new intrisics to support the LDAP1 and STL1 Advanced SIMD
(Neon) instructions introduced as part of FEAT_LRCPC3.
The new intrinsics `vldap1(q)_lane`/`vstl1(q)_lane` generate IR code
similar to the existing `vld1(q)_lane/st1(q)_lane` ones, but capturing
the difference in the atomic release/acquire memory model.

The LLVM code generation changes to ensure that this instruction pair
is lowered to the correct LDAP1/STL1 instructions will be covered in a
separate commit.

Based on a patch by Sam Elliott.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D153128
2023-07-07 12:31:55 +01:00