1856 Commits

Author SHA1 Message Date
Sander de Smalen
d313614b60
[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166)
Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.

Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.

For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0

We have now done the same for ZA, such that we add:
* aarch64_new_za       (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)

This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
2024-02-01 13:37:37 +00:00
Nemanja Ivanovic
67c1c1dbb6
[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919)
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.

I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.

---------

Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
2024-01-26 11:24:50 -05:00
Vojislav Tomasevic
2a77d92e2e
[clang] Incorrect IR involving the use of bcopy (#79298)
This patch addresses the issue regarding the call of bcopy function in a
conditional expression.
It is analogous to the already accepted patch which deals with the same
problem, just regarding the bzero function [0].

Here is the testcase which illustrates the issue:

```
void bcopy(const void *, void *, unsigned long);
void foo(void);

void test_bcopy() {
  char dst[20];
  char src[20];
  int _sz = 20, len = 20;
  return (_sz
          ? ((_sz >= len)
             ? bcopy(src, dst, len)
             : foo())
          : bcopy(src, dst, len));
}
```

When processing it with clang, following issue occurs:

Instruction does not dominate all uses!
%arraydecay2 = getelementptr inbounds [20 x i8], ptr %dst, i64 0, i64 0,
!dbg !38
%cond = phi ptr [ %arraydecay2, %cond.end ], [ %arraydecay5,
%cond.false3 ], !dbg !33
fatal error: error in backend: Broken module found, compilation aborted!

This happens because an incorrect phi node is created. It is created
because bcopy function call is lowered to the call of llvm.memmove
intrinsic and function memmove returns void *. Since llvm.memmove is
called in two places in the same return statement, clang creates a phi
node in the final basic block for the return value and that phi node is
incorrect. However, bcopy function should return void in the first
place, so this phi node is unnecessary. This is what this patch
addresses. An appropriate test is also added and no existing tests fail
when applying this patch.

Also, this crash only happens when LLVM is configured with
-DLLVM_ENABLE_ASSERTIONS=On option.

[0] https://reviews.llvm.org/D39746
2024-01-24 09:39:36 -08:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Matthew Devereau
6ba62f4f25
[AArch64][SME2] Refine fcvtu/fcvts/scvtf/ucvtf (#77947)
Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs.

Use llvm_anyvector_ty for both multi vector returns and operands,
therefore the return and operands can be specified in the intrinsic
call, e.g.

@llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32
2024-01-22 15:11:49 +00:00
Piotr Sobczak
57f6a3f7ea
[AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.

* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
2024-01-18 15:14:42 +01:00
Mikael Holmen
e6bd9835d9 [clang][CodeGen] Fix gcc warning about unused variable [NFC]
Without the fix gcc warned with
 ../../clang/lib/CodeGen/CGBuiltin.cpp:1022:19: warning: unused variable 'DRE' [-Wunused-variable]
  1022 |   if (const auto *DRE = dyn_cast<DeclRefExpr>(Base)) {
       |                   ^~~

Fix the warning by removing the unused variable and change the "dyn_cast"
to "isa".
2024-01-17 13:23:08 +01:00
Bill Wendling
00b6d032a2 [Clang] Implement the 'counted_by' attribute (#76348)
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
2024-01-16 14:26:12 -08:00
Craig Topper
142f270c27 Recommit "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311)"
With lldb build fix.

Original message:

EnumConstantDecl is allocated by the ASTContext allocator so the
destructor is never called.

This patch takes a similar approach to IntegerLiteral by using
APIntStorage to allocate large APSInts using the ASTContext allocator as
well.

The downside is that an additional heap allocation and copy of the data
needs to be made when calling getInitValue if the APSInt is large.

Fixes #78160.
2024-01-16 13:52:17 -08:00
Craig Topper
f3d534c425 Revert "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311)"
This reverts commit 4737959d91fab7673b1bb642f88658bb2a24d723.

Missed an lldb update.
2024-01-16 12:39:47 -08:00
Craig Topper
4737959d91
[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311)
EnumConstantDecl is allocated by the ASTContext allocator so the
destructor is never called.

This patch takes a similar approach to IntegerLiteral by using
APIntStorage to allocate large APSInts using the ASTContext allocator as
well.

The downside is that an additional heap allocation and copy of the data
needs to be made when calling getInitValue if the APSInt is large.

Fixes #78160.
2024-01-16 12:10:38 -08:00
Rashmi Mudduluru
a511c1a9ec
Revert "[Clang] Implement the 'counted_by' attribute (#76348)"
This reverts commit 164f85db876e61cf4a3c34493ed11e8f5820f968.
2024-01-15 18:37:52 -08:00
Bill Wendling
164f85db87 [Clang] Implement the 'counted_by' attribute (#76348)
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
2024-01-10 22:20:31 -08:00
Nico Weber
2dce77201c Revert "[Clang] Implement the 'counted_by' attribute (#76348)"
This reverts commit fefdef808c230c79dca2eb504490ad0f17a765a5.

Breaks check-clang, see
https://github.com/llvm/llvm-project/pull/76348#issuecomment-1886029515

Also revert follow-on "[Clang] Update 'counted_by' documentation"

This reverts commit 4a3fb9ce27dda17e97341f28005a28836c909cfc.
2024-01-10 21:05:19 -05:00
Bill Wendling
4a3fb9ce27 [Clang] Update 'counted_by' documentation
Describe a limitation of the 'counted_by' attribute when used in unions.
Also fix a errant typo.
2024-01-10 15:36:33 -08:00
Bill Wendling
fefdef808c
[Clang] Implement the 'counted_by' attribute (#76348)
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
2024-01-10 15:21:10 -08:00
CarolineConcatto
14e7dac92a
[Clang][LLVM][AArch64]SVE2.1 update the intrinsics according to acle[1] (#76844)
This patch changes the following intrinsic

 ```svst1uwq[_{d}]  replaced by svst1wq[_{d}]
 svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}]
 svst1udq[_{d}]  replaced by svst1dq[_{d}]
 svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}]
```
Drops 'u' from the quadword stores because it is simply truncating the
quadwords to 32 bits

```
 svextq_lane[_{d}] replaced by  svextq[_{d}]
```
EXTQ follows the previous defined EXT intrinsics

```
 svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}]
```
Introduced with the latest SME2 ACLE change

[1]https://github.com/ARM-software/acle/pull/257
2024-01-10 17:12:14 +00:00
Sander de Smalen
5055eeea52
[Clang][AArch64] Add missing SME functions to header file. (#75791)
This includes:
* __arm_in_streaming_mode()
* __arm_has_sme()
* __arm_za_disable()
* __svundef_za()
2024-01-02 09:43:30 +00:00
Dinar Temirbulatov
809f2f3d7d
[AArch64][SME2] Add builtins for FDOT, BFDOT, SUDOT, USDOT, SDOT, UDOT. (#75737)
Add SME2 DOT builtins.
2023-12-21 19:41:24 +00:00
Dinar Temirbulatov
77c5c44b01
[AArch64][SME2] Add SME2 MLA/MLS builtins. (#75584)
Add SME2 MLA/MLS builtins.
2023-12-21 16:42:24 +00:00
Bill Wendling
cca4d6cfd2
Revert counted_by attribute feature (#75857)
There are many issues that popped up with the counted_by feature. The
patch #73730 has grown too large and approval is blocking Linux testing.

Includes reverts of:
commit 769bc11f684d ("[Clang] Implement the 'counted_by' attribute
(#68750)")
commit bc09ec696209 ("[CodeGen] Revamp counted_by calculations
(#70606)")
commit 1a09cfb2f35d ("[Clang] counted_by attr can apply only to C99
flexible array members (#72347)")
commit a76adfb992c6 ("[NFC][Clang] Refactor code to calculate flexible
array member size (#72790)")
commit d8447c78ab16 ("[Clang] Correct handling of negative and
out-of-bounds indices (#71877)")
Partial commit b31cd07de5b7 ("[Clang] Regenerate test checks (NFC)")

Closes #73168
Closes #75173
2023-12-18 15:16:09 -08:00
Paul Walker
dea16ebd26
[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217)
The specialisation will not be valid when ConstantInt gains native
support for vector types.

This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.

Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-18 11:58:42 +00:00
Simon Pilgrim
df3ddd78f6 CGBuiltin - fix gcc Wunused-variable warning. NFC. 2023-12-18 11:51:24 +00:00
Akira Hatanaka
31429e7a89
[CodeGen] Emit a more accurate alignment for non-temporal loads/stores (#75675)
Call EmitPointerWithAlignment to compute the alignment based on the
underlying lvalue's alignment when it's available.
2023-12-17 18:22:44 -08:00
Lei Huang
aaa3f72c1c
[PowerPC] Emit libcall to frexpl for calls to frexp(ppcDoublDouble) (#75226)
On Linux PPC call lib func ``frexpl`` for calls to ``frexp()`` for input
of type PPCDoubleDouble.

Fixes bug: https://github.com/llvm/llvm-project/issues/64426
2023-12-15 17:23:16 -05:00
CarolineConcatto
f2464ca317
[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926)
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.

[1]https://github.com/ARM-software/acle/pull/257

Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
2023-12-13 15:45:59 +00:00
Dinar Temirbulatov
49b27b150b
[AArch64][SME2] Add builtins to cast svbool from/to svcount. (#74720)
Add builtin: 'svreinterpret_b' to cast from svcount_t to svbool_t.
Add builtin: 'svreinterpret_c'  to cast from svbool_t  to svcount_t.

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-08 16:38:29 +00:00
James Y Knight
4d4c30a37c
Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg. (#74349)
Update all callers to pass through the Address.

For the older builtins such as `__sync_*` and MSVC `_Interlocked*`,
natural alignment of the atomic access is _assumed_. This change
preserves that behavior. It will pass through greater-than-required
alignments, however.
2023-12-04 13:37:04 -05:00
Ulrich Weigand
c61eb44005 [SystemZ] Implement vector rotate in terms of funnel shift
Clang currently implements a set of vector rotate builtins
(__builtin_s390_verll*) in terms of platform-specific LLVM
intrinsics.  To simplify the IR (and allow for common code
optimizations if applicable), this patch removes those LLVM
intrinsics and implements the builtins in terms of the
platform-independent funnel shift intrinsics instead.

Also, fix the prototype of the __builtin_s390_verll*
builtins for full compatibility with GCC.
2023-12-04 16:52:00 +01:00
Dominik Adamski
95943d2fab [Flang] Add code-object-version option (#72638)
Information about code object version can be configured by the user for
AMD GPU target and it needs to be placed in LLVM IR generated by Flang.

Information about code object version in MLIR generated by the parser
can be reused by other tools. There is no need to specify extra flags if
we want to invoke MLIR tools (like fir-opt) separately.

Changes in comparison to a8ac93:
 * added information about required targets for test
   flang/test/Driver/driver-help.f90
2023-11-29 03:01:01 -06:00
Dominik Adamski
f00ffcdb58 Revert "[Flang] Add code-object-version option (#72638)"
This commit causes test errors on buildbots.

This reverts commit a8ac930b99d93b2a539ada7e566993d148899144.
2023-11-28 13:18:46 -06:00
Dominik Adamski
a8ac930b99
[Flang] Add code-object-version option (#72638)
Information about code object version can be configured by the user for
AMD GPU target and it needs to be placed in LLVM IR generated by Flang.

Information about code object version in MLIR generated by the parser
can be reused by other tools. There is no need to specify extra flags if
we want to invoke MLIR tools (like fir-opt) separately.
2023-11-28 19:57:36 +01:00
Youngsuk Kim
10e483521a
[clang][CodeGen] Remove ptr-to-ptr bitcasts (NFC) (#73020)
Opaque ptr cleanup effort
2023-11-23 11:34:59 -05:00
Momchil Velikov
f335883808
[AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (#70474)
This patch adds a set of SVE2.1 quadword load/store intrisics:

  * Contiguous zero-extending load to quadword (single vector)

    sv<type>_t svld1uwq[_<typ>](svbool_t, const <type>_t *ptr);
    sv<type>_t svld1uwq_vnum[_<typ>](svbool_t, const <type> *ptr, int64_t vnum);
 
    sv<type>_t svld1udq[_<typ>](svbool_t, const <type>_t *ptr);
    sv<type>_t svld1udq_vnum[_<typ>](svbool_t, const <type>_t *ptr, int64_t vnum);

  * Contiguous truncating store of single vector operand

    void svst1uwq[_<typ>](svbool_t, const <type>_t *ptr, sv<type>_t data);
    void svst1uwq_vnum[_<typ>](svbool_t, const <type>_t *ptr, int64_t vnum, sv<type>_t data);

    void svst1udq[_<typ>](svbool_t, const <type>_t *ptr, sv<type>_t data);
    void svst1udq_vnum[_<typ>](svbool_t, const <type>_t *ptr, int64_t vnum, sv<type>_t data);

  * Gather load quadword

    sv<type>_t svld1q_gather[_u64base]_<typ>(svbool_t pg, svuint64_t zn);
    sv<type>_t svld1q_gather[_u64base]_offset_<typ>(svbool_t pg, svuint64_t zn, int64_t offset);

  * Scatter store quadword

    void svst1q_scatter[_u64base][_<typ>](svbool_t pg, svuint64_t zn, sv<type>_t data);
    void svst1q_scatter[_u64base]_offset[_<typ>](svbool_t pg, svuint64_t zn, int64_t offset, sv<type>_t data);

  * Contiguous load two, three or four quadword structures.

    sv<type>x2_t svld2q[_<typ>](svbool_t pg, const <type>_t *rn);
    sv<type>x2_t svld2q_vnum[_<typ>](svbool_t pg, const <type>_t *rn, uint64_t vnum);
    sv<type>x3_t svld3q[_<typ>](svbool_t pg, const <type>_t *rn);
    sv<type>x3_t svld3q_vnum[_<typ>](svbool_t pg, const <type>_t *rn, uint64_t vnum);
    sv<type>x4_t svld4q[_<typ>](svbool_t pg, const <type>_t *rn);
    sv<type>x4_t svld4q_vnum[_<typ>](svbool_t pg, const <type>_t *rn, uint64_t vnum);

  * Contiguous store two, three or four quadword structures.

    void svst2q[_<typ>](svbool_t pg, <type>_t *rn, sv<type>x2_t zt);
    void svst2q_vnum[_<typ>](svbool_t pg, <type>_t *rn, int64_t vnum, sv<type>x2_t zt);
    void svst3q[_<typ>](svbool_t pg, <type>_t *rn, sv<type>x3_t zt);
    void svst3q_vnum[_<typ>](svbool_t pg, <type>_t *rn, int64_t vnum, sv<type>x3_t zt);
    void svst4q[_<typ>](svbool_t pg, <type>_t *rn, sv<type>x4_t zt);
    void svst4q_vnum[_<typ>](svbool_t pg, <type>_t *rn, int64_t vnum, sv<type>x4_t zt);

ACLE spec: https://github.com/ARM-software/acle/pull/257

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-11-21 15:34:59 +00:00
Bill Wendling
d8447c78ab
[Clang] Correct handling of negative and out-of-bounds indices (#71877)
GCC returns 0 for a negative index on an array in a structure. It also
returns 0 for an array index that goes beyond the extent of the array.
In addition. a pointer to a struct field returns that field's size, not
the size of it plus the rest of the struct, unless it's the first field
in the struct.

  struct s {
    int count;
    char dummy;
    int array[] __attribute((counted_by(count)));
  };

  struct s *p = malloc(...);

  p->count = 10;

A __bdos on the elements of p return:

  __bdos(p, 0) == 30
  __bdos(p->array, 0) == 10
  __bdos(&p->array[0], 0) == 10
  __bdos(&p->array[-1], 0) == 0
  __bdos(&p->array[42], 0) == 0

Also perform some refactoring, putting the "counted_by" calculations in
their own function.
2023-11-20 09:49:20 -08:00
Sam Tebbs
f7b5c25507
[AArch64][SME] Remove immediate argument restriction for svldr and svstr (#68565)
The svldr_vnum and svstr_vnum builtins always modify the base register
and tile slice and provide immediate offsets of zero, even when the
offset provided to the builtin is an immediate. This patch optimises the
output of the builtins when the offset is an immediate, to pass it
directly to the instruction and to not need the base register and tile
slice updates.
2023-11-20 09:57:29 +00:00
Bill Wendling
a76adfb992
[NFC][Clang] Refactor code to calculate flexible array member size (#72790)
The code that calculates the flexible array member size is big enough to
warrant its own method.
2023-11-19 19:25:10 -08:00
Momchil Velikov
96ef623a75
[AArch64] Cast predicate operand of SVE gather loads/scater stores to the parameter type of the intrinsic (NFC) (#71289)
When emitting LLVM IR for gather loads/scatter stores, the predicate
parameter is cast to a type that depends on the loaded, resp. stored
type. That's correct for operation where we have a predicate per lane,
however it is not correct for quadword loads and stores (`LD1Q`, `ST1Q`)
where the predicate is per 128-bit chunk, independent from the ACLE
intrinsic type.

This can be universally handled by cast to the corresponding parameter
type of the intrinsic. The intrinsic itself should be defined in a way
that enforces relations between parameter types.
2023-11-13 16:01:07 +00:00
Jessica Del
b025864af8
[AMDGPU] - Add clang builtins for tied WMMA intrinsics (#70669)
Add clang builtins for the new tied wmma intrinsics. 
These variations tie the destination
accumulator matrix to the input
accumulator matrix.

See https://github.com/llvm/llvm-project/pull/69903 for context.
2023-11-13 13:23:26 +01:00
Fangrui Song
65f2cf25c3 Revert "[CodeGen] -fsanitize=alignment: add cl::opt sanitize-alignment-builtin to disable memcpy instrumentation (#69240)"
This reverts commit e8fe4de64ffb84924c41e54116a04570046eed74.

memcpy/memmove instrumentation for -fsanitize=alignment has been tested
on a huge code base. There were some cleanups but the number does not
justify a workaround.
2023-11-12 22:26:27 -08:00
Bill Wendling
bc09ec6962
[CodeGen] Revamp counted_by calculations (#70606)
Break down the counted_by calculations so that they correctly handle
anonymous structs, which are specified internally as IndirectFieldDecls.

Improves the calculation of __bdos on a different field member in the struct.
And also improves support for __bdos in an index into the FAM. If the index
is further out than the length of the FAM, then we return __bdos's "can't
determine the size" value (zero or negative one, depending on type).

Also simplify the code to use helper methods to get the field referenced
by counted_by and the flexible array member itself, which also had some
issues with FAMs in sub-structs.
2023-11-09 10:18:17 -08:00
Saiyedul Islam
21861991e7
[OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (#71234)
Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses
already available global variable "oclc_ABI_version" instead of
"llvm.amdgcn.abi.verion".

It also adds some minor fields in ImplicitArg structure.
2023-11-09 10:34:35 +05:30
Pravin Jagtap
1f21e49870
Revert "Revert "[AMDGPU] const-fold imm operands of (#71669)
amdgcn_update_dpp intrinsic (#71139)""

This reverts commit d1fb9307951319eea3e869d78470341d603c8363 and fixes
the lit test clang/test/CodeGenHIP/dpp-const-fold.hip

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-11-09 10:09:22 +05:30
Mitch Phillips
d1fb930795 Revert "[AMDGPU] const-fold imm operands of amdgcn_update_dpp intrinsic (#71139)"
This reverts commit 32a3f2afe6ea7ffb02a6a188b123ded6f4c89f6c.

Reason: Broke the sanitizer buildbots. More details at
32a3f2afe6
2023-11-08 12:50:53 +01:00
Pravin Jagtap
32a3f2afe6
[AMDGPU] const-fold imm operands of amdgcn_update_dpp intrinsic (#71139)
Operands of `__builtin_amdgcn_update_dpp` need to evaluate to constant
to match the intrinsic requirements.

Fixes: SWDEV-426822, SWDEV-431138
---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-11-08 15:09:10 +05:30
Noah Goldstein
590884a860 [Clang][CodeGen] Stoping emitting alignment assumes for align_{up,down}
Now that `align_{up,down}` use `llvm.ptrmask` (as of #71238), the
assume doesn't preserve any information that is not still easily
re-computable.

Closes #71295
2023-11-07 00:31:04 -06:00
Vlad Serebrennikov
dda8e3de35 [clang][NFC] Refactor ImplicitParamDecl::ImplicitParamKind
This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.
2023-11-06 12:01:09 +03:00
Noah Goldstein
71be514fa0 [Clang][CodeGen] Emit llvm.ptrmask for align_up and align_down
Since PR's #69343 and #67166 we probably have enough support for
`llvm.ptrmask` to make it preferable to the GEP stategy.

Closes #71238
2023-11-04 14:20:54 -05:00
Momchil Velikov
9b3bb7a066
[AArch64] Implement reinterpret builtins for SVE vector tuples (#69598)
This patch adds reinterpret builtins as proposed here:
https://github.com/ARM-software/acle/pull/275.

The builtins take the form:

    sv<dst>x<N>_t svreinterpret_<dst>_<src>_x<N>(sv<src>x<N>_t op)

where
- <src> and <dst> designate the source and the destination type,
respectively, all pairs chosen from {s8, u8, s16, u8, s32, u32, s64,
u64, bf16, f16, f32, f64}
  - <N> designated the number of tuple elements, 2, 3 or 4

A short (overloaded) for is also provided, where the destination type is
explicitly designated and the source type is deduced from the parameter
type. These take the form

    sv<dst>x<N>_t svreinterpret_<dst>(sv<src>x<N>_t op)

For example:

    svuin16x2_t svreinterpret_u16_s32_x2(svint32x2_t op);
    svuin16x2_t svreinterpret_u16(svint32x2_t op);
2023-11-03 11:45:08 +00:00
Kerry McLaughlin
8f59c168a9
[AArch64][Clang] Refactor code to emit SVE & SME builtins (#70959)
This patch removes duplicated code in EmitAArch64SVEBuiltinExpr and
EmitAArch64SMEBuiltinExpr by creating a new function called
GetAArch64SVEProcessedOperands which handles splitting up multi-vector
arguments using vector extracts.

These changes are non-functional.
2023-11-02 15:47:37 +00:00