72 Commits

Author SHA1 Message Date
Nikita Popov
246a64a12e
[Clang] Rename HasLegalHalfType -> HasFastHalfType (NFC) (#153163)
This option is confusingly named. What it actually controls is whether,
under the default of `-ffloat16-excess-precision=standard`, it is
beneficial for performance to perform calculations on float (without
intermediate rounding) or not. For `-ffloat16-excess-precision=none` the
LLVM `half` type will always be used, and all backends are expected to
legalize it correctly.
2025-08-18 09:23:48 +02:00
Simon Pilgrim
91fff70740
[clang][X86] Replace vprot/vprol/vpror/vshld/vshrd intrinsics with __builtin_elementwise_fshl/fshr (#153229)
Replaces the XOP/AVX512 per-element rotation/funnel shift builtins with the generic __builtin_elementwise_fshl/fshr

We still have uniform immediate variants to handle next.

Part of #153152
2025-08-13 10:28:30 +01:00
moorabbit
f8653cecd1
[Clang][X86] Replace F16C vcvtph2ps/256 intrinsics with (convert|shuffle)vector builtins (#152911)
The following intrinsics were replaced by a combination of
`__builtin_shufflevector` and `__builtin_convertvector`:
- `__builtin_ia32_vcvtph2ps`
- `__builtin_ia32_vcvtph2ps256`

Fixes #152749
2025-08-12 16:32:19 +01:00
moorabbit
989c0d2526
[Clang][X86] Replace unnecessary vfmadd* builtins with element_wise_fma (#152545)
The following intrinsics were replaced by `__builtin_elementwise_fma`:
- `__builtin_ia32_vfmaddps(256)`
- `__builtin_ia32_vfmaddpd(256)`
- `__builtin_ia32_vfmaddph(256)`
- `__builtin_ia32_vfmaddbf16(128 | 256 | 512)`

All the aforementioned `__builtin_ia32_vfmadd*` intrinsics are
equivalent to a `__builtin_elementwise_fma`, so keeping them is an
unnecessary indirection.

Fixes [#152461](https://github.com/llvm/llvm-project/issues/152461)

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-08 20:51:15 +01:00
Hood Chatham
b9c328480c
[clang][WebAssembly] Support reftypes & varargs in test_function_pointer_signature (#150921)
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).

I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.

It will also crash in the backend if passed an int128 or float128 type.
2025-08-07 13:07:04 -07:00
Stanislav Mekhanoshin
34aed0ed56
[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194) 2025-08-05 15:15:21 -07:00
zGoldthorpe
d7074b63ed
[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216)
This patch exposes builtins for atomic `add`, `max`, and `min` operations that
operate over buffer resource pointers.
2025-08-05 11:04:15 -06:00
Stanislav Mekhanoshin
a153e83e41
[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036) 2025-08-04 19:16:34 -07:00
Benjamin Maxwell
af44d87e0d
[clang][SME] Remove folding of __arm_in_streaming_mode() (NFC) (#150917)
This is handled by the instcombine added in #147930; there is no need
for any clang-specific folding. NFC as all clang tests for
`__arm_in_streaming_mode()` used -O1, which applies the LLVM
instcombines.
2025-07-29 10:42:45 +01:00
Hood Chatham
15b03687ff
[WebAssembly,clang] Add __builtin_wasm_test_function_pointer_signature (#150201)
Tests if the runtime type of the function pointer matches the static
type. If this returns false, calling the function pointer will trap.
Uses `@llvm.wasm.ref.test.func` added in #147486.

Also adds a "gc" wasm feature to gate the use of the ref.test
instruction.
2025-07-25 16:52:39 -07:00
Changpeng Fang
d7a38a94cd
[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540) 2025-07-24 16:23:33 -07:00
Pierre van Houtryve
cd1b84caa8
[NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (#148627)
"amdgpu-as" is way too vague and doesn't give enough context.
We may want to support it on normal atomics too, to control the synchronized (ordered) AS.
If we do that, the name has to be less vague.
2025-07-24 12:41:57 +02:00
Alexandros Lamprineas
3ab64c5b29
[NFC][Clang][FMV] Make FMV priority data type future proof. (#150079)
FMV priority is the returned value of a polymorphic function. On RISC-V
and X86 targets a 32-bit value is enough. On AArch64 we currently need
64 bits and we will soon exceed that. APInt seems to be a suitable
replacement for uint64_t, presumably with minimal compile time overhead.
It allows bit manipulation, comparison and variable bit width.
2025-07-23 10:37:29 +01:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Shilei Tian
602d43cfd1
[Clang][AMDGPU] Add the missing builtin __builtin_amdgcn_sqrt_bf16 (#149447)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 08:43:08 -04:00
Shilei Tian
aecd44818a
[AMDGPU] Add support for v_tanh_f16 on gfx1250 (#149439)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-18 00:21:04 -04:00
Shilei Tian
7e105fbdbe
[AMDGPU] Add support for v_tanh_f32 on gfx1250 (#149360)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 15:42:35 -04:00
Shilei Tian
fd5fc76c91
[AMDGPU] Add support for v_cos_bf16 on gfx1250 (#149355)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 14:43:34 -04:00
Shilei Tian
a102342990
[AMDGPU] Add support for v_sin_bf16 on gfx1250 (#149241)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 08:49:45 -04:00
Shilei Tian
a6b5ece75e
[AMDGPU] Add support for v_exp_bf16 on gfx1250 (#149229)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 08:46:01 -04:00
Shilei Tian
ad6d5d2821
[AMDGPU] Add support for v_log_bf16 on gfx1250 (#149201)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16 19:09:34 -04:00
Shilei Tian
7d2a58e87d
[AMDGPU] Add support for v_rsq_bf16 on gfx1250 (#149194)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-16 19:06:03 -04:00
raoanag
056f0a10b3
[HLSL][DXIL] Implement refract intrinsic (#147342)
- [x] Implement refract using HLSL source in hlsl_intrinsics.h
- [x] Implement the refract SPIR-V target built-in in
clang/include/clang/Basic/BuiltinsSPIRV.td
- [x] Add sema checks for refract to CheckSPIRVBuiltinFunctionCall in
clang/lib/Sema/SemaSPIRV.cpp
- [x] Add codegen for spv refract to EmitSPIRVBuiltinExpr in
CGBuiltin.cpp
- [x] Add codegen tests to clang/test/CodeGenHLSL/builtins/refract.hlsl
- [x] Add spv codegen test to clang/test/CodeGenSPIRV/Builtins/refract.c
- [x] Add sema tests to clang/test/SemaHLSL/BuiltIns/refract-errors.hlsl
- [x] Add spv sema tests to
clang/test/SemaSPIRV/BuiltIns/refract-errors.c
- [x] Create the int_spv_refract intrinsic in IntrinsicsSPIRV.td
- [x] In SPIRVInstructionSelector.cpp create the refract lowering and
map it to int_spv_refract in SPIRVInstructionSelector::selectIntrinsic.
- [x] Create SPIR-V backend test case in
llvm/test/CodeGen/SPIRV/hlsl-intrinsics/refract.ll
- [x] Check for what OpenCL support is needed.

Resolves https://github.com/llvm/llvm-project/issues/99153
2025-07-16 11:28:55 -07:00
Jim Lin
3e4153c97b
[RISCV] Implement Builtins for XAndesBFHCvt extension. (#148804)
XAndesBFHCvt provides two builtins functions for converting between
float and bf16. Users can use them to convert bf16 values loaded from
memory to float, perform arithmetic operations, then convert them back
to bf16 and store them to memory.

The load/store and move operations for bf16 will be handled in a later
patch.
2025-07-16 16:13:31 +08:00
Changpeng Fang
c962f2b29d
AMDGPU: Implement builtins for gfx1250 wmma instructions (#148991)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15 18:17:12 -07:00
Shilei Tian
dabc8e2ec1
[AMDGPU] Add support for v_rcp_bf16 on gfx1250 (#148916)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-15 16:12:51 -04:00
RolandF77
d9e21a92a7
[PowerPC] Add DMF basic builtins (#145372)
Add support for PPC Dense Math basic builtins dmsetdmrz, dmmr, dmxor.
2025-07-15 13:46:07 -04:00
Shilei Tian
d7ec80c897
[AMDGPU] Add support for v_tanh_bf16 on gfx1250 (#147425)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-14 16:30:18 -04:00
Victor Lomuller
27c9b55659
[SPIRV] Add more id and range builtIns (#143909)
The patch adds intrinsics and lowering logic for GlobalSize,
GlobalOffset, SubgroupMaxSize, NumWorkgroups, WorkgroupSize,
WorkgroupId, LocalInvocationId, GlobalInvocationId, SubgroupSize,
NumSubgroups, SubgroupId and SubgroupLocalInvocationId SPIR-V builtins.

The patch also extend spv_thread_id, spv_group_id and
spv_thread_id_in_group to return anyint rather than i32. This allows the
intrinsics to support the opencl environment.

For each of the intrinsics, new clang builtins were added as well as a
binding for the SPIR-V "friendly" format. The original format doesn't
define such binding (uses global variables) but it is not possible to
express the Input SC which is normally required by the environement
specs, and using builtin functions is the most usual approach for other
backend and programming models.
2025-07-09 13:52:06 +01:00
David Green
9fcea2e465 [ARM] Add neon vector support for roundeven
As per #142559, this marks froundeven as legal for Neon and upgrades the
existing arm.neon.vrintn intrinsics.
2025-07-04 15:27:33 +01:00
David Green
ec35065789 [ARM] Add neon vector support for rint
As per #142559, this marks frint as legal for Neon and upgrades the existing
arm.neon.vrintx intrinsics.
2025-07-03 21:27:48 +01:00
David Green
1f8f477bd0 [ARM] Add neon vector support for trunc
As per #142559, this marks ftrunc as legal for Neon and upgrades the existing
arm.neon.vrintz intrinsics.
2025-07-03 07:41:13 +01:00
Adam Glass
ed27f18e32
__sys builtin support for AArch64 (#146456)
Adds support for __sys Clang builtin for AArch64

__sys is a long existing MSVC intrinsic used to manage caches, tlbs, etc
by writing to system registers:
* It takes a macro-generated constant and uses it to form the AArch64 SYS instruction which is MSR with op0=1. The macro drops op0 and expects the implementation to hardcode it to 1 in the encoding.
* Volume use is in systems code (kernels, hypervisors, boot environments, firmware)
* Has an unused return value due to MSVC cut/paste error

Implementation:
* Clang builtin, sharing code with Read/WriteStatusReg
* Hardcodes the op0=1
* Explicitly returns 0
* Code-format change from clang-format
* Unittests included
* Not limited to MSVC-environment as its generally useful and neutral
2025-07-02 10:17:01 -07:00
David Green
5332534b9c [ARM] Add neon vector support for ceil
As per #142559, this marks fceil as legal for Neon and upgrades the existing
arm.neon.vrintp intrinsics.
2025-07-01 15:41:10 +01:00
David Green
6bd9ff04af [ARM] Add neon vector support for round
As per #142559, this marks fround as legal for Neon and upgrades the existing
arm.neon.vrinta intrinsics.
2025-06-30 17:15:26 +01:00
Changpeng Fang
1f5f381920
AMDGPU: Implement intrinsic/builtins for gfx1250 load transpose instructions (#146289) 2025-06-29 14:33:31 -07:00
David Green
dcc9e36b18
[ARM] Add neon vector support for floor (#142559)
This marks ffloor as legal providing that armv8 and neon is present (or
fullfp16 for the fp16 instructions). The existing arm_neon_vrintm
intrinsics are auto-upgraded to llvm.floor.

If this is OK I will update the other vrint intrinsics.
2025-06-29 11:37:16 +01:00
Adam Glass
d9a7b16479
InterlockedAdd_*, InterlockedAdd64_* support for AArch64 (#145607)
This PR adds support for InterlockedAdd_{acq, nf, rel}, and
InterlockedAdd64_{acq, nf, rel} for Aarch64.
2025-06-25 12:09:30 -07:00
Reid Kleckner
948cc91188
Reapply "[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099)" (#138360)
This reverts commit 83ff9d4a34b1e579dd809759d13b70b8837f0cde.

Don't change the builtin signature of _mm_prefetch this time.
2025-06-24 22:07:07 -06:00
Kazu Hirata
ae372bfca8
[CodeGen] Use range-based for loops (NFC) (#145142) 2025-06-21 08:20:57 -07:00
Paul Walker
f43aaf90df
[NFC][LLVM] Refactor IRBuilder::Create{VScale,ElementCount,TypeSize}. (#142803)
CreateVScale took a scaling parameter that had a single use outside of
IRBuilder with all other callers having to create a redundant
ConstantInt. To work round this some code perferred to use
CreateIntrinsic directly.

This patch simplifies CreateVScale to return a call to the llvm.vscale()
intrinsic and nothing more. As well as simplifying the existing call
sites I've also migrated the uses of CreateIntrinsic.

Whilst IRBuilder used CreateVScale's scaling parameter as part of the
implementations of CreateElementCount and CreateTypeSize, I have
follow-on work to switch them to the NUW varaiety and thus they would
stop using CreateVScale's scaling as well. To prepare for this I have
moved the multiplication and constant folding into the implementations
of CreateElementCount and CreateTypeSize.

As a final step I have replaced some callers of CreateVScale with
CreateElementCount where it's clear from the code they wanted the
latter.
2025-06-10 12:35:59 +01:00
Victor Lomuller
c474f8f240
[clang][SPIRV] Add builtin for OpGenericCastToPtrExplicit and its SPIR-V friendly binding (#137805)
The patch introduce __builtin_spirv_generic_cast_to_ptr_explicit which
is lowered to the llvm.spv.generic.cast.to.ptr.explicit intrinsic.

The SPIR-V builtins are now split into 3 differents file:
BuiltinsSPIRVCore.td,
BuiltinsSPIRVVK.td for Vulkan specific builtins, BuiltinsSPIRVCL.td for
OpenCL specific builtins
and BuiltinsSPIRVCommon.td for common ones.

The patch also introduces a new header defining its SPIR-V friendly
equivalent (__spirv_GenericCastToPtrExplicit_ToGlobal,
__spirv_GenericCastToPtrExplicit_ToLocal and
__spirv_GenericCastToPtrExplicit_ToPrivate). The functions are declared
as aliases to the new builtin allowing C-like languages to have a
definition to rely on as well as gaining proper front-end diagnostics.

The motivation for the header is to provide a stable binding for
applications or library (such as SYCL) and allows non SPIR-V targets to
provide an implementation (via libclc or similar to how it is done for
gpuintrin.h).
2025-05-29 15:19:40 +02:00
Alex MacLean
3a84a4e55d
Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (#141143)
Note: This relands #140615 adding a ".count" suffix to the non-".all"
variants.

Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.

- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)

The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:

* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
2025-05-22 19:38:10 -07:00
Alex Maclean
e72d8b2553 Revert "[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615)"
This reverts commit 735209c0688b10a66c24750422b35d8c2ad01bb5.
2025-05-22 17:28:43 +00:00
Alex MacLean
735209c068
[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615)
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.

- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)

The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:

* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
2025-05-21 08:14:15 -07:00
Jim Lin
d561d595c4
[RISCV] Implement intrinsics for XAndesVPackFPH (#140007)
This patch implements clang intrinsic support for XAndesVPackFPH.

The document for the intrinsics can be found at:

https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/intrinsic_funcs.adoc#andes-vector-packed-fp16-extensionxandesvpackfph
and with policy variants

https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/policy_funcs/intrinsic_funcs.adoc#andes-vector-packed-fp16-extensionxandesvpackfph

Co-authored-by: Tony Chuan-Yue Yuan <yuan593@andestech.com>
2025-05-20 13:16:51 +08:00
choikwa
77de8a0c0a
[AMDGPU][clang] provide device implementation for __builtin_logb and … (#129347)
…__builtin_scalbn

Clang generates library calls for __builtin_* functions which can be a
problem for GPUs that cannot handle them. This patch generates call to
device implementation for __builtin_logb and ldexp intrinsic for
__builtin_scalbn.
2025-05-19 14:11:31 -04:00
Krzysztof Drewniak
4bdd116b80
[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to
LDS from global (address space 1) pointers and buffer fat pointers
(address space 7), since they use the same API and "gather from a
pointer to LDS" is something of an abstract operation.

This commit adds the intrinsic and its lowerings for addrspaces 1 and 7,
and updates the MLIR wrappers to use it (loosening up the restrictions
on loads to LDS along the way to match the ground truth from target
features).

It also plumbs the intrinsic through to clang.
2025-05-19 07:15:04 -07:00
Kiva
af083d09bd
[RISCV] Add zihintpause LLVM/Clang intrinsic (#139519)
This PR adds the missing intrinsic `__builtin_riscv_pause` for the
zihintpause extension.

Spec:
https://five-embeddev.com/riscv-user-isa-manual/Priv-v1.12/zihintpause.html
Fixes #129961
2025-05-16 14:20:53 +08:00
Lukacma
6fc0312919
[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (#128019)
This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.

It also changes mfloat8 type representation in memory from `i8` to
`<1xi8>`
2025-05-15 14:01:41 +01:00