2168 Commits

Author SHA1 Message Date
Jessica Del
b025864af8
[AMDGPU] - Add clang builtins for tied WMMA intrinsics (#70669)
Add clang builtins for the new tied wmma intrinsics. 
These variations tie the destination
accumulator matrix to the input
accumulator matrix.

See https://github.com/llvm/llvm-project/pull/69903 for context.
2023-11-13 13:23:26 +01:00
Fangrui Song
65f2cf25c3 Revert "[CodeGen] -fsanitize=alignment: add cl::opt sanitize-alignment-builtin to disable memcpy instrumentation (#69240)"
This reverts commit e8fe4de64ffb84924c41e54116a04570046eed74.

memcpy/memmove instrumentation for -fsanitize=alignment has been tested
on a huge code base. There were some cleanups but the number does not
justify a workaround.
2023-11-12 22:26:27 -08:00
Bill Wendling
bc09ec6962
[CodeGen] Revamp counted_by calculations (#70606)
Break down the counted_by calculations so that they correctly handle
anonymous structs, which are specified internally as IndirectFieldDecls.

Improves the calculation of __bdos on a different field member in the struct.
And also improves support for __bdos in an index into the FAM. If the index
is further out than the length of the FAM, then we return __bdos's "can't
determine the size" value (zero or negative one, depending on type).

Also simplify the code to use helper methods to get the field referenced
by counted_by and the flexible array member itself, which also had some
issues with FAMs in sub-structs.
2023-11-09 10:18:17 -08:00
Saiyedul Islam
21861991e7
[OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (#71234)
Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses
already available global variable "oclc_ABI_version" instead of
"llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
2023-11-09 10:34:35 +05:30
Pravin Jagtap
1f21e49870
Revert "Revert "[AMDGPU] const-fold imm operands of (#71669)
amdgcn_update_dpp intrinsic (#71139)""

This reverts commit d1fb9307951319eea3e869d78470341d603c8363 and fixes
the lit test clang/test/CodeGenHIP/dpp-const-fold.hip

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-11-09 10:09:22 +05:30
Mitch Phillips
d1fb930795 Revert "[AMDGPU] const-fold imm operands of amdgcn_update_dpp intrinsic (#71139)"
This reverts commit 32a3f2afe6ea7ffb02a6a188b123ded6f4c89f6c.

Reason: Broke the sanitizer buildbots. More details at
32a3f2afe6
2023-11-08 12:50:53 +01:00
Pravin Jagtap
32a3f2afe6
[AMDGPU] const-fold imm operands of amdgcn_update_dpp intrinsic (#71139)
Operands of `__builtin_amdgcn_update_dpp` need to evaluate to constant
to match the intrinsic requirements.

Fixes: SWDEV-426822, SWDEV-431138
---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-11-08 15:09:10 +05:30
Noah Goldstein
590884a860 [Clang][CodeGen] Stoping emitting alignment assumes for align_{up,down}
Now that `align_{up,down}` use `llvm.ptrmask` (as of #71238), the
assume doesn't preserve any information that is not still easily
re-computable.

Closes #71295
2023-11-07 00:31:04 -06:00
Vlad Serebrennikov
dda8e3de35 [clang][NFC] Refactor ImplicitParamDecl::ImplicitParamKind
This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.
2023-11-06 12:01:09 +03:00
Noah Goldstein
71be514fa0 [Clang][CodeGen] Emit llvm.ptrmask for align_up and align_down
Since PR's #69343 and #67166 we probably have enough support for
`llvm.ptrmask` to make it preferable to the GEP stategy.

Closes #71238
2023-11-04 14:20:54 -05:00
Momchil Velikov
9b3bb7a066
[AArch64] Implement reinterpret builtins for SVE vector tuples (#69598)
This patch adds reinterpret builtins as proposed here:
https://github.com/ARM-software/acle/pull/275.

The builtins take the form:

    sv<dst>x<N>_t svreinterpret_<dst>_<src>_x<N>(sv<src>x<N>_t op)

where
- <src> and <dst> designate the source and the destination type,
respectively, all pairs chosen from {s8, u8, s16, u8, s32, u32, s64,
u64, bf16, f16, f32, f64}
  - <N> designated the number of tuple elements, 2, 3 or 4

A short (overloaded) for is also provided, where the destination type is
explicitly designated and the source type is deduced from the parameter
type. These take the form

    sv<dst>x<N>_t svreinterpret_<dst>(sv<src>x<N>_t op)

For example:

    svuin16x2_t svreinterpret_u16_s32_x2(svint32x2_t op);
    svuin16x2_t svreinterpret_u16(svint32x2_t op);
2023-11-03 11:45:08 +00:00
Kerry McLaughlin
8f59c168a9
[AArch64][Clang] Refactor code to emit SVE & SME builtins (#70959)
This patch removes duplicated code in EmitAArch64SVEBuiltinExpr and
EmitAArch64SMEBuiltinExpr by creating a new function called
GetAArch64SVEProcessedOperands which handles splitting up multi-vector
arguments using vector extracts.

These changes are non-functional.
2023-11-02 15:47:37 +00:00
Kerry McLaughlin
e2550b7aa0 Revert "[AArch64][Clang] Refactor code to emit SVE & SME builtins (#70662)"
This reverts commit c34efe3c2734629b925d9411b3c86a710911a93a.
2023-11-01 15:57:14 +00:00
Kerry McLaughlin
c34efe3c27
[AArch64][Clang] Refactor code to emit SVE & SME builtins (#70662)
This patch removes duplicated code in EmitAArch64SVEBuiltinExpr and
EmitAArch64SMEBuiltinExpr by creating a new function called
GetAArch64SVEProcessedOperands which handles splitting up multi-vector
arguments using vector extracts.

These changes are non-functional.
2023-11-01 15:21:08 +00:00
Youngsuk Kim
bc44e6e7c6 [clang] Remove no-op ptr-to-ptr bitcasts (NFC)
Opaque pointer cleanup effort.
2023-11-01 09:06:15 -05:00
Serge Pavlov
fc7198b799
[clang] Additional FP classification functions (#69041)
C language standard defined library functions `iszero`, `issignaling`
and `issubnormal`, which did not have counterparts among clang builtin
functions. This change adds new functions:

    __builtin_iszero
    __builtin_issubnormal
    __builtin_issignaling

They provide builtin implementation for the missing standard functions.

Pull request: https://github.com/llvm/llvm-project/pull/69041
2023-11-01 12:10:54 +07:00
Vlad Serebrennikov
49fd28d960 [clang][NFC] Refactor ArrayType::ArraySizeModifier
This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.
2023-10-31 18:06:34 +03:00
Qiu Chaofan
de7c006832
[PowerPC] Fix use of FPSCR builtins in smmintrin.h (#67299)
smmintrin.h uses __builtin_mffs, __builtin_mffsl, __builtin_mtfsf and
__builtin_set_fpscr_rn. This patch replaces the uses with ppc prefix
and implement the missing ones.
2023-10-26 15:56:32 +08:00
Rana Pratap Reddy
13ea1146a7
[AMDGPU] Lower __builtin_amdgcn_read_exec_hi to use amdgcn_ballot (#69567)
Currently __builtin_amdgcn_read_exec_hi lowers to llvm.read_register,
this patch lowers it to use amdgcn_ballot.
2023-10-26 10:26:11 +05:30
Caroline Concatto
e4b75d836d [Clang][SVE2.1] Add builtins for Multi-vector load and store
Patch by : David Sherwood <david.sherwood@arm.com>

As described in: https://github.com/ARM-software/acle/pull/257

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D151433
2023-10-19 15:11:58 +00:00
Caroline Concatto
ba47bc7fd4 [Clang][SVE2.1] Add pfalse builtin
As described in: https://github.com/ARM-software/acle/pull/257

Patch by : Sander de Smalen<sander.desmalen@arm.com>

Reviewed By: dtemirbulatov

Differential Revision: https://reviews.llvm.org/D151199
2023-10-19 08:55:32 +00:00
Fangrui Song
e8fe4de64f
[CodeGen] -fsanitize=alignment: add cl::opt sanitize-alignment-builtin to disable memcpy instrumentation (#69240)
Deploying #67766 to a large internal codebase uncovers many bugs (many
are
probably benign but need cleaning up). There are also issues in
high-profile
open-source projects like v8. Add a cl::opt to disable builtin
instrumentation
for -fsanitize=alignment to help large codebase users.

In the long term, this cl::opt option may still be useful to debug
-fsanitize=alignment instrumentation on builtins, so we probably want to
keep it around.
2023-10-18 11:43:36 -07:00
Caroline Concatto
1b93e15bcd [Clang][SVE2p1] Add svpsel builtins
As described in: https://github.com/ARM-software/acle/pull/257

Patch by : Sander de Smalen<sander.desmalen@arm.com>

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D151197
2023-10-18 15:05:26 +00:00
Caroline Concatto
7cad5a9eb4 [Clang][SVE2.1] Add svpext builtins
As described in: https://github.com/ARM-software/acle/pull/257

Reviewed By: hassnaa-arm

Differential Revision: https://reviews.llvm.org/D151081
2023-10-17 16:15:22 +00:00
Sam Tebbs
4b8f23e93d
[AArch64][SME] Remove immediate argument restriction for svldr and svstr (#68908)
The svldr_vnum_za and svstr_vnum_za builtins/intrinsics currently
require that the vnum argument be an immediate, but since vnum is used
to modify the base register via a mul and add, that restriction is not
necessary. This patch removes that restriction.
2023-10-17 16:02:36 +01:00
Alex Voicu
dd5d65adb6 [HIP][Clang][CodeGen] Add CodeGen support for hipstdpar
This patch adds the CodeGen changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. This change relaxes restrictions on what gets emitted on the device path, when compiling in `hipstdpar` mode:

1. Unless a function is explicitly marked `__host__`, it will get emitted, whereas before only `__device__` and `__global__` functions would be emitted;
2. Unsupported builtins are ignored as opposed to being marked as an error, as the decision on their validity is deferred to the `hipstdpar` specific code selection pass;
3. We add a `hipstdpar` specific pass to the opt pipeline, independent of optimisation level:
    - When compiling for the host, iff the user requested it via the `--hipstdpar-interpose-alloc` flag, we add a pass which replaces canonical allocation / deallocation functions with accelerator aware equivalents.

A test to validate that unannotated functions get correctly emitted is added as well.

Reviewed by: yaxunl, efriedma

Differential Revision: https://reviews.llvm.org/D155850
2023-10-17 11:41:36 +01:00
Bill Wendling
769bc11f68
[Clang] Implement the 'counted_by' attribute (#68750)
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member in the same
structure holding the count of elements in the flexible array. This
information can be used to improve the results of the array bound
sanitizer and the '__builtin_dynamic_object_size' builtin.

This example specifies the that the flexible array member 'array' has
the number of elements allocated for it in 'count':

  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };

This establishes a relationship between 'array' and 'count',
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained through changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }

  void use_foo(int index) {
    p->count += 42;
    p->array[index] = 0; /* The sanitizer cannot properly check this access */
  }

Reviewed By: nickdesaulniers, aaron.ballman

Differential Revision: https://reviews.llvm.org/D148381
2023-10-14 04:18:02 -07:00
Amy Huang
e220398cc3
[MSVC, ARM64] Add __prefetch intrinsic (#67174)
Implement __prefetch intrinsic. 

MSVC docs:
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170

Bug: https://github.com/llvm/llvm-project/issues/65405
2023-10-13 13:34:15 -07:00
Fangrui Song
792674400f -fsanitize=alignment: check memcpy/memmove arguments (#67766)
The -fsanitize=alignment implementation follows the model that we allow
forming unaligned pointers but disallow accessing unaligned pointers.
See [RFC: Enforcing pointer type alignment in Clang](https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html)
for detail.

memcpy is a memory access and we require an `int *` argument to be aligned.
Similar to https://reviews.llvm.org/D9673 , emit -fsanitize=alignment check for
arguments of builtin memcpy and memmove functions to catch misaligned load like:
```
// Check the alignment of a but ignore the alignment of b
void unaligned_load(int *a, void *b) { memcpy(a, b, sizeof(*a)); }
```

For a reference parameter, we emit a -fsanitize=alignment check as well, which
can be optimized out by InstCombinePass. We rely on the call site
`TCK_ReferenceBinding` check instead.
```
// The alignment check of a will be optimized out.
void unaligned_load(int &a, void *b) { memcpy(&a, b, sizeof(a)); }
```

The diagnostic message looks like
```
runtime error: store to misaligned address [[PTR:0x[0-9a-f]*]] for type 'int *'
```

We could use a better message for memcpy, but we don't do it for now as it would
require a new check name like misaligned-pointer-use, which is probably not
necessary. *RFC: Enforcing pointer type alignment in Clang* is not well documented,
but this patch does not intend to change the that.

Technically builtin memset functions can be checked for -fsanitize=alignment as
well, but it does not seem too useful.
2023-10-09 23:02:07 -07:00
alexfh
67b675ee55
Revert "[Clang] Implement the 'counted_by' attribute" (#68603)
This reverts commit 9a954c693573281407f6ee3f4eb1b16cc545033d, which
causes clang crashes when compiling with `-fsanitize=bounds`. See

9a954c6935 (commitcomment-129529574)
for details.
2023-10-09 20:53:48 +02:00
Bill Wendling
9a954c6935 [Clang] Implement the 'counted_by' attribute
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member in the same
structure holding the count of elements in the flexible array. This
information can be used to improve the results of the array bound sanitizer
and the '__builtin_dynamic_object_size' builtin.

This example specifies the that the flexible array member 'array' has the
number of elements allocated for it in 'count':

  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };

This establishes a relationship between 'array' and 'count', specifically
that 'p->array' must have *at least* 'p->count' number of elements available.
It's the user's responsibility to ensure that this relationship is maintained
through changes to the structure.

In the following, the allocated array erroneously has fewer elements than
what's specified by 'p->count'. This would result in an out-of-bounds access not
not being detected:

  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }

The next example updates 'p->count', breaking the relationship requirement that
'p->array' must have at least 'p->count' number of elements available:

  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }

  void use_foo(int index) {
    p->count += 42;
    p->array[index] = 0; /* The sanitizer cannot properly check this access */
  }

Reviewed By: nickdesaulniers, aaron.ballman

Differential Revision: https://reviews.llvm.org/D148381
2023-10-04 18:26:15 -07:00
Pavel Iliin
8ec50d6446 [AArch64] Fix FMV ifunc resolver usage on old Android APIs. Rename internal compiler-rt FMV functions.
The patch fixes Function Multi Versioning features detection by ifunc
resolver on Android API levels < 30.
Ifunc hwcaps parameters are not supported on Android API levels 23-29,
so all CPU features are set unsupported if they were not initialized
before ifunc resolver call.
There is no support for ifunc on Android API levels < 23, so Function
Multi Versioning is disabled in this case.

Also use two underscore prefix for FMV runtime support functions to
avoid conflict with user program ones.

Differential Revision: https://reviews.llvm.org/D158641
2023-09-29 17:10:48 +01:00
Fangrui Song
0d8b864829 CGBuiltin: emit llvm.abs.* instead of neg+icmp+select for abs
instcombine will combine neg+icmp+select to llvm.abs.*. Let's just emit
llvm.abs.* in the first place.
2023-09-27 21:29:56 -07:00
Phoebe Wang
31631d307f
[X86][FP16] Add missing handling for FP16 constrained cmp intrinsics (#67400) 2023-09-26 19:27:57 +08:00
Björn Pettersson
b4858c634e
[clang][CodeGen] Simplify code based on opaque pointers (#65624)
- Update CodeGenTypeCache to use a single union for all pointers in
  address space zero.
- Introduce a UnqualPtrTy in CodeGenTypeCache, and use that (for
  example instead of llvm::PointerType::getUnqual) in some places.
- Drop some redundant bit/pointers casts from ptr to ptr.
2023-09-25 11:21:24 +02:00
Carlos Eduardo Seo
7523550853
[Clang][CodeGen] Add __builtin_bcopy (#67130)
Add __builtin_bcopy to the list of GNU builtins. This was causing a
series of test failures in glibc.

Adjust the tests to reflect the changes in codegen.

Fixes #51409.
Fixes #63065.
2023-09-24 11:58:14 -03:00
Amy Huang
03c698a431
[MSVC, ARM64] Add _Copy* and _Count* intrinsics (#66554)
Implement the _Count* and _Copy* Windows ARM intrinsics:

```
double _CopyDoubleFromInt64(__int64)
float _CopyFloatFromInt32(__int32)
__int32 _CopyInt32FromFloat(float)
__int64 _CopyInt64FromDouble(double)
unsigned int _CountLeadingOnes(unsigned long)
unsigned int _CountLeadingOnes64(unsigned __int64)
unsigned int _CountLeadingSigns(long)
unsigned int _CountLeadingSigns64(__int64)
unsigned int _CountLeadingZeros(unsigned long)
unsigned int _CountLeadingZeros64(unsigned __int64)
unsigned int _CountOneBits(unsigned long)
unsigned int _CountOneBits64(unsigned __int64)
```

Full list of intrinsics here:
[https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics](https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics)

Bug: [65405](https://github.com/llvm/llvm-project/issues/65405)
2023-09-21 14:34:59 -07:00
Zahira Ammarguellat
a292e7edf8
Fix math-errno issue (#66381)
Update handling of math errno. This change updates the logic for
generation of math intrinics in place of math library function calls.
The previous logic https://reviews.llvm.org/D151834 was incorrectly
using intrinsics when math errno handling was needed at optimization
levels above -O0.
This also fixes issue mentioned in https://reviews.llvm.org/D151834 by
@uabelho
This is joint work with @andykaylor Andy.
2023-09-19 09:13:02 -04:00
Matt Arsenault
ddc3346a6b
clang/AMDGPU: Fix accidental behavior change for __builtin_amdgcn_ldexph (#66340) 2023-09-14 18:15:44 +03:00
CarolineConcatto
ee31ba0dd9
[AArch64][SME]Update intrinsic interface for ld1/st1 (#65582)
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. 
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset

[1]https://github.com/ARM-software/acle/pull/225/files
2023-09-13 15:24:09 +01:00
Joseph Huber
49ff6a96a7
[Clang] Define AMDGPU ABI when referenced in CodeGen for ABI "none" (#66162)
Summary:
We use the `llvm.amgcn.abi.version` varaible to control code generation.
This is emitted in every module now to indicate what should be used when
compiling. Previously, the logic caused us to emit an external reference
to this variable when creating the code for the `none` type. This would
then cause us not to emit the actual definition. This patch refines the
logic to create the external reference, and then update it if it is
found unset by the time we emit the global. I had to remove the
reference to `GetOrCreateLLVmGlobal` because it did not accept the
proper address space.
2023-09-13 08:31:31 -05:00
CarolineConcatto
dc8d2ecc5e
[AArch64][SME]Update intrinsic interface for read/write (#65594)
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. This patch is the #2 of 3 patches to update the interface.

Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset

[1]https://github.com/ARM-software/acle/pull/225/files
2023-09-12 18:08:57 +01:00
CarolineConcatto
7b8d4eff02
[AArch64][SME]Update intrinsic interface for ldr/str (#65593)
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. 

[1]https://github.com/ARM-software/acle/pull/225/files
2023-09-12 17:31:51 +01:00
Max Iyengar
dbeb3d029d Add missing vrnd intrinsics
This patch adds 8 missing intrinsics as specified in the Arm ACLE document section 2.12.1.1 : [[ https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#rounding-3 | https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#rounding-3]]

The intrinsics implemented are:

  - vrnd32z_f64
  - vrnd32zq_f64
  - vrnd64z_f64
  - vrnd64zq_f64
  - vrnd32x_f64
  - vrnd32xq_f64
  - vrnd64x_f64
  - vrnd64xq_f64

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D158626
2023-09-11 12:59:18 +01:00
Matt Arsenault
6a08cf12d9 clang: Add __builtin_exp10* and use new llvm.exp10 intrinsic
https://reviews.llvm.org/D157911
2023-09-09 23:14:12 +03:00
Zahira Ammarguellat
2c93e3c1c8 Take math-errno into account with '#pragma float_control(precise,on)' and
'attribute__((optnone)).

Differential Revision: https://reviews.llvm.org/D151834
2023-09-08 09:48:53 -04:00
Saiyedul Islam
f616c3eeb4
[OpenMP][DeviceRTL][AMDGPU] Support code object version 5
Update DeviceRTL and the AMDGPU plugin to support code
object version 5. Default is code object version 4.

CodeGen for __builtin_amdgpu_workgroup_size generates code
for cov4 as well as cov5 if -mcode-object-version=none
is specified. DeviceRTL compilation passes this argument
via Xclang option to generate abi-agnostic code.

Generated code for the above builtin uses a clang
control constant "llvm.amdgcn.abi.version" to branch on
the abi version, which is available during linking of
user's OpenMP code. Load of this constant gets eliminated
during linking.

AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.

Differential Revision: https://reviews.llvm.org/D139730

Reviewed By: jhuber6, yaxunl
2023-08-29 06:35:44 -05:00
Kazu Hirata
722474969e [clang] Use SmallDenseMap::contains (NFC) 2023-08-27 08:26:50 -07:00
Fangrui Song
27da15381c [X86] __builtin_cpu_supports: support x86-64{,-v2,-v3,-v4}
GCC 12 (https://gcc.gnu.org/PR101696) allows
__builtin_cpu_supports("x86-64") (and -v2 -v3 -v4).
This patch ports the feature.

* Add `FEATURE_X86_64_{BASELINE,V2,V3,V4}` to enum ProcessorFeatures,
  but keep CPU_FEATURE_MAX unchanged to make
  FeatureInfos/FeatureInfos_WithPLUS happy.
* Change validateCpuSupports to allow `x86-64{,-v2,-v3,-v4}`
* Change getCpuSupportsMask to return `std::array<uint32_t, 4>` where
  `x86-64{,-v2,-v3,-v4}` set bits `FEATURE_X86_64_{BASELINE,V2,V3,V4}`.
* `target("x86-64")` and `cpu_dispatch(x86_64)` are invalid. Tested by commit 9de3b35ac9159d5bae6e6796cb91e4f877a07189

Close https://github.com/llvm/llvm-project/issues/59961

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D158811
2023-08-25 20:56:25 -07:00
Yaxun (Sam) Liu
63f0833ec4 [clang] Fix missing contract flag in sqrt intrinsic
The fp options specified through pragma are already encoded in Expr.

This patch takes the same approach used by clang codegen to emit
fastmath flags for fadd insts, basically use RAII to set the
current fastmath flags in IRBuilder, which is then used to emit
sqrt intrinsic.

Fixes: https://github.com/llvm/llvm-project/issues/64653
2023-08-24 19:15:17 -04:00