2139 Commits

Author SHA1 Message Date
Nikita Popov
b384d6d6cc
[CodeGen] Don't include CGDebugInfo.h in CodeGenFunction.h (NFC) (#134100)
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.

This reduces time to build clang by ~0.5% in terms of instructions
retired.
2025-04-03 08:04:19 +02:00
Jonathan Thackray
a1a74c9e80
[NFC][clang] Remove superfluous header files after refactor in #132252 (#132495)
Remove superfluous header files after refactor in #132252
2025-03-26 14:45:00 +00:00
Jonathan Thackray
7f920e2e5f
[NFC][clang] Split clang/lib/CodeGen/CGBuiltin.cpp into target-specific files (#132252)
clang/lib/CodeGen/CGBuiltin.cpp is over 1MB long (>23k LoC), and can
take minutes to recompile (depending on compiler and host system) when
modified, and 5 seconds for clangd to update for every edit. Splitting
this file was discussed in this thread:

   https://discourse.llvm.org/t/splitting-clang-s-cgbuiltin-cpp-over-23k-lines-long-takes-1min-to-compile/

and the idea has received a number of +1 votes, hence this change.
2025-03-21 19:09:39 +00:00
Phoebe Wang
09feaa9261
Revert "[X86][AVX10.2] Support YMM rounding new instructions (#101825)" (#132362)
This reverts commit 0dba5381d8c8e4cadc32a067bf2fe5e3486ae53d.

YMM rounding was removed from AVX10 whitepaper. Ref:
https://cdrdv2.intel.com/v1/dl/getContent/784343

The MINMAX and SATURATING CONVERT instructions will be removed as a
follow up.
2025-03-21 20:12:57 +08:00
Rahul Joshi
88a51d2392
[Clang][NFC] Code cleanup in CGBuiltin.cpp (#132060)
- Use `Intrinsic::` directly instead of `llvm::Intrinsic::`.
- Eliminate redundant `nullptr` for some `CreateIntrinsic` calls.
- Eliminate redundant `ArrayRef` casts.
- Use C++17 structured binding instead of `std::tie`.
2025-03-20 15:46:28 -07:00
Aaron Ballman
d781ac1cf0
[C23] Add __builtin_c23_va_start (#131166)
This builtin is supported by GCC and is a way to improve diagnostic
behavior for va_start in C23 mode. C23 no longer requires a second
argument to the va_start macro in support of variadic functions with no
leading parameters. However, we still want to diagnose passing more than
two arguments, or diagnose when passing something other than the last
parameter in the variadic function.

This also updates the freestanding <stdarg.h> header to use the new
builtin, same as how GCC works.

Fixes #124031
2025-03-15 11:01:53 -04:00
Juan Manuel Martinez Caamaño
7decd04626
[Clang] Add __builtin_elementwise_exp10 in the same fashion as exp/exp2 (#130746)
Clang has __builtin_elementwise_exp and __builtin_elementwise_exp2
intrinsics, but no __builtin_elementwise_exp10.

There doesn't seem to be a good reason not to expose the exp10 flavour
of this intrinsic too.

This commit introduces this intrinsic following the same pattern as the
exp and exp2 versions.

Fixes: SWDEV-519541
2025-03-12 09:20:29 +01:00
Benjamin Maxwell
fb397ab1e5
Reland "[clang] Lower modf builtin using llvm.modf intrinsic" (#130761)
Reverts
c40f0fe434

Original description:
This updates the existing modf[f|l] builtin to be lowered via the
llvm.modf.* intrinsic (rather than directly to a library call).

The Windows 32-bit x86 missing `modff` symbol issue should have been
solved in: https://github.com/llvm/llvm-project/pull/130636.
2025-03-11 14:55:33 +00:00
Hans Wennborg
c40f0fe434 Revert "Reland "[clang] Lower modf builtin using llvm.modf intrinsic" (#129885)"
This broke modff calls on 32-bit x86 Windows. See comment on the PR.

> This updates the existing modf[f|l] builtin to be lowered via the
> llvm.modf.* intrinsic (rather than directly to a library call).
>
> The legalization issues exposed by the original PR (#126750) should have
> been fixed in #128055 and #129264.

This reverts commit cd1d9a8fab05524a27ffdb251f6def37786b5cc1.
2025-03-10 16:35:03 +01:00
Chris B
e85e29c299
[HLSL] select scalar overloads for vector conditions (#129396)
This PR adds scalar/vector overloads for vector conditions to the
`select` builtin, and updates the sema checking and codegen to allow
scalars to extend to vectors.

Fixes #126570
2025-03-09 16:01:12 -05:00
Matt Arsenault
bfea84946d
clang: Hack around opencl enqueue_block using wrong ABI for aggregrate (#130011)
EmitAggExprToLValue started wrapping the temporary alloca in an
addrspacecast
at some point. We take the direct type from this as the pointer argument
for the
runtime function type, but this isn't correct. Technically, we should be
querying
the target's ABI for what IR to produce for this sequence. The
assumption seems to
always have been that this will be indirectly passed with byval (or
byref).

I started working on a patch to go through the ABI handling, but it
seems to
require more time and/or clang expertise than I have at the moment.
2025-03-06 23:13:28 +07:00
Benjamin Maxwell
cd1d9a8fab
Reland "[clang] Lower modf builtin using llvm.modf intrinsic" (#129885)
Reverts llvm/llvm-project#127987

Original description:
This updates the existing modf[f|l] builtin to be lowered via the
llvm.modf.* intrinsic (rather than directly to a library call).

The legalization issues exposed by the original PR (#126750) should have
been fixed in #128055 and #129264.
2025-03-06 09:43:59 +00:00
Deric C.
b4ecebe745
[HLSL] [DXIL] Implement the AddUint64 HLSL function and the UAddc DXIL op (#127137)
Fixes #99205.

- Implements the HLSL intrinsic `AddUint64` used to perform unsigned
64-bit integer addition by using pairs of unsigned 32-bit integers
instead of native 64-bit types
- The LLVM intrinsic `uadd_with_overflow` is used in the implementation
of `AddUint64` in `CGBuiltin.cpp`
- The DXIL op `UAddc` was defined in `DXIL.td`, and a lowering of the
LLVM intrinsic `uadd_with_overflow` to the `UAddc` DXIL op was
implemented in `DXILOpLowering.cpp`

Notes:
- `__builtin_addc` was not able to be used to implement `AddUint64` in
`hlsl_intrinsics.h` because its `CarryOut` argument is a pointer, and
pointers are not supported in HLSL
- A lowering of the LLVM intrinsic `uadd_with_overflow` to SPIR-V
[already
exists](https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/SPIRV/llvm-intrinsics/uadd.with.overflow.ll)
- When lowering the LLVM intrinsic `uadd_with_overflow` to the `UAddc`
DXIL op, the anonymous struct type `{ i32, i1 }` is replaced with a
named struct type `%dx.types.i32c`. This aspect of the implementation
may be changed when issue #113192 gets addressed
- Fixes issues mentioned in the comments on the original PR #125319

---------

Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com>
Co-authored-by: Farzon Lotfi <farzonlotfi@microsoft.com>
Co-authored-by: Chris B <beanz@abolishcrlf.org>
Co-authored-by: Justin Bogner <mail@justinbogner.com>
2025-03-05 17:04:10 -08:00
YunQiang Su
4bd3427321
IRBuilder: Add FMFSource parameter to CreateMaxNum/CreateMinNum (#129173)
In https://github.com/llvm/llvm-project/pull/112852, we claimed that
llvm.minnum and llvm.maxnum should treat +0.0>-0.0, while libc doesn't
require fmin(3)/fmax(3) for it.

Let's add FMFSource parameter to CreateMaxNum and CreateMinNum, so that
they can be used by CodeGenFunction::EmitBuiltinExpr of Clang.
2025-03-04 08:49:35 +08:00
metkarpoonam
23efe734fc
[HLSL] Add "or" intrinsic (#128979)
Include HLSL or_intrinsic, add codegen in CGBuiltin, and the
corresponding tests in or.hlsl. Additionally, incorporate
logical-operator-errors to handle both 'and' and 'or' semantic
diagnostics.
2025-02-28 13:54:13 -07:00
Virginia Cangelosi
2477f82db9
[clang] Update SVE load and store intrinsics to have FP8 variants (#126726) 2025-02-28 14:20:59 +00:00
Heejin Ahn
40566fd674
[WebAssembly] Generate invokes with llvm.wasm.(re)throw (#128105)
Even though `__builtin_wasm_throw`, which is lowered down to
`llvm.wasm.throw`, throws,
```cpp
try {
  __builtin_wasm_throw(0, obj);
} catch (...) {
}
```
does not generate `invoke`. This is because we have assumed the
intrinsic cannot be invoked, which doesn't make much sense. (See #128104
for the historical context)

#128104 made `llvm.wasm.throw` intrinsic invokable in the backend. This
actually generates `invoke`s in Clang for `__builtin_wasm_throw`.

While we're at it, this also generates `invoke`s for
`__builtin_wasm_rethrow`, which is actually not used anywhere in C++
support. I haven't deleted it just in case in may have uses later. (For
example, to support rethrow functionality that carries stack trace with
exnref)

Depends on #128104 for the CI to pass.
Fixes #124710.
2025-02-25 14:12:35 -08:00
Deric Cheung
305d273894
Reland "[HLSL] Implement the reflect HLSL function" (#125599)
This PR relands #122992.

A reland was attempted before (#123853), but it [failed to pass the
`sanitizer-aarch64-linux-bootstrap-hwasan`
buildbot](https://github.com/llvm/llvm-project/pull/123853#issuecomment-2608389396)
due to the test `llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll`

The issue has since been patched thanks to @vitalybuka, so the PR is
safe to reland without any changes.
See
https://github.com/llvm/llvm-project/pull/125599#discussion_r1966650839
and
https://github.com/llvm/llvm-project/pull/125599#discussion_r1966650839
2025-02-24 16:46:59 -08:00
Benjamin Maxwell
5a7ee431fd
[clang] Add __builtin_sincospi that lowers to llvm.sincospi.* (#127065)
This (only) adds the `__builtin` variant which lowers to the
`llvm.sincospi.*` intrinsic when `-fno-math-errno` is set.
2025-02-21 16:49:41 +00:00
Benjamin Maxwell
d595fc91ae
Revert "[clang] Lower modf builtin using llvm.modf intrinsic" (#127987)
Reverts llvm/llvm-project#126750

Revering while I investigate:
https://lab.llvm.org/buildbot/#/builders/72/builds/8406
2025-02-20 10:25:18 +00:00
Deric Cheung
1c762c288f
[HLSL] Implement the 'and' HLSL function (#127098)
Addresses #125604 

- Implements `and` as an HLSL builtin function
- The `and` HLSL builtin function gets lowered to the the LLVM `and`
instruction
2025-02-19 11:22:46 -08:00
Benjamin Maxwell
d804c83893
[clang] Lower modf builtin using llvm.modf intrinsic (#126750)
This updates the existing `modf[f|l]` builtin to be lowered via the
`llvm.modf.*` intrinsic (rather than directly to a library call).
2025-02-19 14:49:22 +00:00
Benjamin Maxwell
7781e1040d
[clang] Lower non-builtin sincos[f|l] calls to llvm.sincos.* when -fno-math-errno is set (#121763)
This will allow vectorizing these calls (after a few more patches). This
should not change the codegen for targets that enable the use of AA
during the codegen (in `TargetSubtargetInfo::useAA()`). This includes
targets such as AArch64. This notably does not include x86 but can be
worked around by passing `-mllvm -combiner-global-alias-analysis=true`
to clang.

Follow up to #114086.
2025-02-19 10:13:41 +00:00
Krzysztof Drewniak
f7d03707d1
[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828)
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)

To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.

The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .

This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
2025-02-18 14:15:28 -06:00
Florian Hahn
50d10b5c1c
[Clang] Add __builtin_assume_dereferenceable to encode deref assumption. (#121789)
This patch adds a new __builtin_assume_dereferenceable to encode
dereferenceability of a pointer using llvm.assume with an operand
bundle.

For now the builtin only accepts constant sizes, I am planning to drop
this restriction in a follow-up change.

This can be used to better optimize cases where a pointer is known to be
dereferenceable, e.g. unconditionally loading from p2 when vectorizing
the loop.

    int *get_ptr();

    void foo(int* src, int x) {
      int *p2 = get_ptr();
      __builtin_assume_aligned(p2, 4);
      __builtin_assume_dereferenceable(p2, 4000);
      for (unsigned I = 0; I != 1000; ++I) {
        int x = src[I];
        if (x == 0)
          x = p2[I];
	 src[I] = x;
      }
    }


PR: https://github.com/llvm/llvm-project/pull/121789
2025-02-14 12:44:20 +01:00
Ulrich Weigand
adacbf68eb [SystemZ] Add codegen support for llvm.roundeven
This is straightforward as we already had all the necessary
instructions, they simply were not wired up.

Also allows implementing the vec_round intrinsic via the
standard llvm.roundeven IR instead of a platform intrinsic now.
2025-02-14 00:10:37 +01:00
Bill Wendling
2eb44aa0a9
[Clang][counted-by] Bail out of visitor for LValueToRValue cast (#125571)
An LValueToRValue cast shouldn't be ignored, so bail out of the visitor
if we encounter one.
2025-02-04 11:00:44 -08:00
Chandler Carruth
cd269fee05 [StrTable] Switch Clang builtins to use string tables
This both reapplies #118734, the initial attempt at this, and updates it
significantly.

First, it uses the newly added `StringTable` abstraction for string
tables, and simplifies the construction to build the string table and
info arrays separately. This should reduce any `constexpr` compile time
memory or CPU cost of the original PR while significantly improving the
APIs throughout.

It also restructures the builtins to support sharding across several
independent tables. This accomplishes two improvements from the
original PR:

1) It improves the APIs used significantly.

2) When builtins are defined from different sources (like SVE vs MVE in
   AArch64), this allows each of them to build their own string table
   independently rather than having to merge the string tables and info
   structures.

3) It allows each shard to factor out a common prefix, often cutting the
   size of the strings needed for the builtins by a factor two.

The second point is important both to allow different mechanisms of
construction (for example a `.def` file and a tablegen'ed `.inc` file,
or different tablegen'ed `.inc files), it also simply reduces the sizes
of these tables which is valuable given how large they are in some
cases. The third builds on that size reduction.

Initially, we use this new sharding rather than merging tables in
AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system
works, as without further changes these still push scaling limits.
Subsequent commits will more deeply leverage the new structure,
including using the prefix capabilities which cannot be easily factored
out here and requires deep changes to the targets.
2025-02-04 18:04:57 +00:00
Hans Wennborg
83ff9d4a34 Revert "[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099)"
This broke the build, see buildbot comments on the PR.

This reverts commit ee92122b53c7af26bb766e89e1d30ceb2fd5bb93 and
follow-up 5dccfd9283cd784758aa3d16fcb6e31f135c080f.
2025-02-04 11:19:20 +01:00
Reid Kleckner
ee92122b53
[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099)
This is similar in spirit to previous changes to make _mm_mfence
builtins to avoid conflicts with winnt.h and other MSVC ecosystem
headers that pre-declare compiler intrinsics as extern "C" symbols.

Also update the feature flag for _mm_prefetch to sse, which is more accurate than mmx.

This should fix issue #87515.
2025-02-03 14:05:58 -08:00
Bill Wendling
cff0a460ae
[Clang][counted_by] Refactor __builtin_dynamic_object_size on FAMs (#122198)
Refactoring of how __builtin_dynamic_object_size() is calculated for
flexible array members (in preparation for adding support for the
'counted_by' attribute on pointers in structs).

The only functionality change is that we use the already emitted Expr
code to build our calculations off of rather than re-emitting the Expr.
That allows the 'StructFieldAccess' visitor to sift through all casts
and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs
and calculate offsets based off of relative distances from that
MemberExpr.

The testcase passes execution tests.

Calculate the flexible array member's object size using these formulae
(note: if the calculation is negative, we return 0.): 

     struct p;
     struct s {
         /* ... */
         int count;
         struct p *array[] __attribute__((counted_by(count)));
     };   

1) 'ptr->array':

   count = ptr->count;

   flexible_array_member_base_size = sizeof (*ptr->array);
   flexible_array_member_size =
           count * flexible_array_member_base_size;

   if (flexible_array_member_size < 0) 
       return 0;
   return flexible_array_member_size;

2) '&ptr->array[idx]':

   count = ptr->count;
   index = idx; 

   flexible_array_member_base_size = sizeof (*ptr->array);
   flexible_array_member_size =
           count * flexible_array_member_base_size;

   index_size = index * flexible_array_member_base_size;

   if (flexible_array_member_size < 0 || index < 0) 
       return 0;
   return flexible_array_member_size - index_size;

3) '&ptr->field':

   count = ptr->count;
   sizeof_struct = sizeof (struct s);

   flexible_array_member_base_size = sizeof (*ptr->array);
   flexible_array_member_size =
           count * flexible_array_member_base_size;

   field_offset = offsetof (struct s, field);
   offset_diff = sizeof_struct - field_offset;

   if (flexible_array_member_size < 0) 
       return 0;
   return offset_diff + flexible_array_member_size;

4) '&ptr->field_array[idx]':

   count = ptr->count;
   index = idx; 
   sizeof_struct = sizeof (struct s);

   flexible_array_member_base_size = sizeof (*ptr->array);
   flexible_array_member_size =
           count * flexible_array_member_base_size;

   field_base_size = sizeof (*ptr->field_array);
   field_offset = offsetof (struct s, field)
   field_offset += index * field_base_size;

   offset_diff = sizeof_struct - field_offset;

   if (flexible_array_member_size < 0 || index < 0) 
       return 0;
   return offset_diff + flexible_array_member_size;

---------

Signed-off-by: Bill Wendling <morbo@google.com>
2025-01-30 15:36:13 -08:00
Nikita Popov
1295aa2e81
[Clang] Add -fwrapv-pointer flag (#122486)
GCC supports three flags related to overflow behavior:
 * `-fwrapv`: Makes signed integer overflow well-defined.
 * `-fwrapv-pointer`: Makes pointer overflow well-defined.
* `-fno-strict-overflow`: Implies `-fwrapv -fwrapv-pointer`, making both
signed integer overflow and pointer overflow well-defined.

Clang currently only supports `-fno-strict-overflow` and `-fwrapv`, but
not `-fwrapv-pointer`.

This PR proposes to introduce `-fwrapv-pointer` and adjust the semantics
of `-fwrapv` to match GCC.

This allows signed integer overflow and pointer overflow to be
controlled independently, while `-fno-strict-overflow` still exists to
control both at the same time (and that option is consistent across GCC
and Clang).
2025-01-28 09:57:00 +01:00
Adam Yang
aab25f20f6
[HLSL][SPIRV][DXIL] Implement WaveActiveMax intrinsic (#123428)
```    - add clang builtin to Builtins.td
      - link builtin in hlsl_intrinsics
      - add codegen for spirv intrinsic and two directx intrinsics to retain
        signedness information of the operands in CGBuiltin.cpp
      - add semantic analysis in SemaHLSL.cpp
      - add lowering of spirv intrinsic to spirv backend in
        SPIRVInstructionSelector.cpp
      - add lowering of directx intrinsics to WaveActiveOp dxil op in
    DXIL.td

      - add test cases to illustrate passespendent pr merges.
```
Resolves #99170
2025-01-27 23:26:56 -08:00
Momchil Velikov
f75860f895
[AArch64] Implement NEON FP8 intrinsics for fused multiply-add (#123615)
This patch adds the following intrinsics:

* Fused multiply-add non-indexed

float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t,
fpm_t)
float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t,
fpm_t)
        
float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)

* Floating-point multiply-add long to half-precision (vector, by
element)

float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
    
* Floating-point multiply-add long-long to single-precision (vector, by
element)

float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
2025-01-28 00:38:44 +00:00
Momchil Velikov
804b81d39f
[AArch64] Add FP8 Neon intrinsics for dot-product (#123613)
This patch adds the following intrinsics:

float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t
vm, fpm_t fpm)
float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, fpm_t fpm)
    
float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
    
float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t
vm, fpm_t fpm)
float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, fpm_t fpm)

float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
2025-01-27 21:14:16 +00:00
Momchil Velikov
99bd2e3f12
[AArch64] Add Neon FP8 conversion intrinsics (#123612)
The patch adds the following intrinsics:

    bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm)
mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn,
float32x4_t vm, fpm_t fpm)
    
mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm)
mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t
fpm)

Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>
2025-01-27 17:32:47 +00:00
Momchil Velikov
87103a016f
[AArch64] Implement NEON FP8 vectors as VectorType (#123603)
Reimplement Neon FP8 vector types using attribute `neon_vector_type`
instead of having them as builtin types.
This allows to implement FP8 Neon intrinsics without the need to add
special cases for these types when using `__builtin_shufflevector`
or bitcast (using C-style cast operator) between vectors, both
extensively used in the generated code in `arm_neon.h`.
2025-01-27 10:41:53 +00:00
Phoebe Wang
ee2722fc88
[X86][AVX10.2-BF16] Remove [NE]P from intrinsic and instruction name (#123335)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2025-01-24 15:49:28 +08:00
Finn Plummer
0fe8e70c66
Revert "Reland "[HLSL] Implement the reflect HLSL function"" (#124046)
Reverts llvm/llvm-project#123853

The introduction of `reflect-error.ll` surfaced a bug with the use of
`report_fatal_error` in `SPIRVInstructionSelector` that was propagated
into the pr. This has caused a build-bot breakage, and the work to solve
the underlying issue is tracked here:
https://github.com/llvm/llvm-project/issues/124045. We can re-apply this
commit when the underlying issue is resolved.
2025-01-22 18:22:03 -08:00
Deric Cheung
2656928d0c
Reland "[HLSL] Implement the reflect HLSL function" (#123853)
This PR relands
[#122992](https://github.com/llvm/llvm-project/pull/122992).

Some machines were failing to run the `reflect-error.ll` test due to the
RUN lines
```llvm
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
```
which failed when `spirv-tools` was not present on the machine due to
running the command `not` without any arguments.

These RUN lines have been removed since they don't actually test
anything new compared to the other two RUN lines due to the expected
error during instruction selection.
```llvm
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
```
2025-01-22 13:29:19 -08:00
Oliver Stannard
c4ef805b0b
[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#121943)
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.

This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.

Fixes https://github.com/llvm/llvm-project/issues/111293.

This is a re-land of #120449, modified to allow any non-const pointer
type for the first argument.
2025-01-22 10:48:04 +00:00
schittir
65df99c208
[NFC] Avoid potential nullptr deref by using castAs<> (#123395)
Use castAs<> instead of getAs<>
2025-01-21 21:41:37 -08:00
Finn Plummer
4c91263045
Revert "[HLSL] Implement the reflect HLSL function" (#123846)
Reverts llvm/llvm-project#122992

Due to an included failing test-case the commit causes build failures.
2025-01-21 15:12:58 -08:00
Deric Cheung
dd860bcfb5
[HLSL] Implement the reflect HLSL function (#122992)
Fixes #99152

Tasks completed:

- Implement `reflect` in `clang/lib/Headers/hlsl/hlsl_intrinsics.h`
- Implement the `reflect` SPIR-V target built-in in
`clang/include/clang/Basic/BuiltinsSPIRV.td`
- Add a SPIR-V fast path in `clang/lib/Headers/hlsl/hlsl_detail.h` in
the form
 ```c++
#if (__has_builtin(__builtin_spirv_reflect))
  return __builtin_spirv_reflect(...);
 #else
   return ...; // regular behavior
 #endif
 ```
- Add codegen for the SPIR-V `reflect` built-in to
`EmitSPIRVBuiltinExpr` in `clang/lib/CodeGen/CGBuiltin.cpp`
- Add HLSL codegen tests to
`clang/test/CodeGenHLSL/builtins/reflect.hlsl`
- Add SPIR-V built-in codegen tests to
`clang/test/CodeGenSPIRV/Builtins/reflect.c`
- Add sema tests to `clang/test/SemaHLSL/BuiltIns/reflect-errors.hlsl`
- Add SPIR-V sema tests to
`clang/test/CodeGenSPIRV/Builtins/reflect-errors.c`
- Create the `int_spv_reflect` intrinsic in
`llvm/include/llvm/IR/IntrinsicsSPIRV.td`
- In `llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp` create the
`reflect` lowering and map it to `int_spv_reflect` in
`SPIRVInstructionSelector::selectIntrinsic`
- Create a SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/reflect.ll`

Additional tasks completed:

- Implement sema check for the `reflect` SPIR-V built-in in
`clang/lib/Sema/SemaSPIRV.cpp`
- Required for HLSL codegen to work via the SPIR-V fast path, because
the types defined in `clang/include/clang/Basic/BuiltinsSPIRV.td` are
being overridden
- Create SPIR-V backend error test case in
`llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll`
- Since `reflect` is only available in the GLSL extended instruction
set, using it in OpenCL should result in an error

Incomplete tasks:

- Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/opencl/reflect.ll`
- An OpenCL test is not applicable in this case because the [OpenCL
SPIR-V extended instruction
set](https://registry.khronos.org/SPIR-V/specs/unified1/OpenCL.ExtendedInstructionSet.100.html)
does not include a `reflect` function
2025-01-21 14:30:29 -08:00
David Green
6dc356d698 [Clang] Add numeric for iota.
Hopefuly fixes MSVC build after 547bfda56b2e3f3a4c6d2357d3566dcd3fa996ad.
2025-01-21 10:36:58 +00:00
David Green
547bfda56b
[AArch64] Improve bcvtn2 and remove aarch64_neon_bfcvt intrinsics (#120363)
This started out as trying to combine bf16 fpround to BFCVT2
instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in
favour of generating fpround instructions directly. This simplifies the
patterns and can lead to other optimizations. The BFCVT2 instruction is
adjusted to makes sure the types are valid, and a bfcvt2 is now
generated in more place. The old intrinsics are auto-upgraded to fptrunc
instructions too.
2025-01-21 09:16:04 +00:00
Ulrich Weigand
8424bf207e [SystemZ] Add support for new cpu architecture - arch15
This patch adds support for the next-generation arch15
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Detection of arch15 as host processor.
- Assembler/disassembler support for new instructions.
- Exploitation of new instructions for code generation.
- New vector (signed|unsigned|bool) __int128 data types.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10305.

Note: No currently available Z system supports the arch15
architecture.  Once new systems become available, the
official system name will be added as supported -march name.
2025-01-20 19:30:21 +01:00
Farzon Lotfi
eddeb36cf1
[SPIRV] add pre legalization instruction combine (#122839)
- Add the boilerplate to support instcombine in SPIRV
- instcombine length(X-Y) to distance(X,Y)
- switch HLSL's distance intrinsic to not special case for SPIRV.
- fixes #122766
- This RFC we were requested to add in the infra for pattern matching:
https://discourse.llvm.org/t/rfc-add-targetbuiltins-for-spirv-to-support-hlsl/83329/13
2025-01-17 14:46:14 -05:00
Adam Yang
4446a9849a
[HLSL][SPIRV][DXIL] Implement WaveActiveSum intrinsic (#118580)
```    - add clang builtin to Builtins.td
      - link builtin in hlsl_intrinsics
      - add codegen for spirv intrinsic and two directx intrinsics to retain
        signedness information of the operands in CGBuiltin.cpp
      - add semantic analysis in SemaHLSL.cpp
      - add lowering of spirv intrinsic to spirv backend in
        SPIRVInstructionSelector.cpp
      - add lowering of directx intrinsics to WaveActiveOp dxil op in
    DXIL.td

      - add test cases to illustrate passespendent pr merges.
```
Resolves #70106

---------

Co-authored-by: Finn Plummer <canadienfinn@gmail.com>
2025-01-16 10:35:23 -08:00
Ashley Coleman
4f48abff0f
[HLSL] Implement elementwise firstbitlow builtin (#116858)
Closes https://github.com/llvm/llvm-project/issues/99116

Implements `firstbitlow` by extracting common functionality from
`firstbithigh` into a shared function while also fixing a bug for an edge
case where `u64x3` and larger vectors will attempt to create vectors
larger than the SPRIV max of 4.
---------

Co-authored-by: Steven Perron <stevenperron@google.com>
2025-01-15 15:36:50 -07:00