llvm-project

Author	SHA1	Message	Date
Nikita Popov	246a64a12e	[Clang] Rename HasLegalHalfType -> HasFastHalfType (NFC) (#153163 ) This option is confusingly named. What it actually controls is whether, under the default of `-ffloat16-excess-precision=standard`, it is beneficial for performance to perform calculations on float (without intermediate rounding) or not. For `-ffloat16-excess-precision=none` the LLVM `half` type will always be used, and all backends are expected to legalize it correctly.	2025-08-18 09:23:48 +02:00
Simon Pilgrim	91fff70740	[clang][X86] Replace vprot/vprol/vpror/vshld/vshrd intrinsics with __builtin_elementwise_fshl/fshr (#153229 ) Replaces the XOP/AVX512 per-element rotation/funnel shift builtins with the generic __builtin_elementwise_fshl/fshr We still have uniform immediate variants to handle next. Part of #153152	2025-08-13 10:28:30 +01:00
moorabbit	f8653cecd1	[Clang][X86] Replace F16C `vcvtph2ps/256` intrinsics with `(convert\|shuffle)vector` builtins (#152911 ) The following intrinsics were replaced by a combination of `__builtin_shufflevector` and `__builtin_convertvector`: - `__builtin_ia32_vcvtph2ps` - `__builtin_ia32_vcvtph2ps256` Fixes #152749	2025-08-12 16:32:19 +01:00
moorabbit	989c0d2526	[Clang][X86] Replace unnecessary `vfmadd` builtins with `element_wise_fma` (#152545 ) The following intrinsics were replaced by `__builtin_elementwise_fma`: - `__builtin_ia32_vfmaddps(256)` - `__builtin_ia32_vfmaddpd(256)` - `__builtin_ia32_vfmaddph(256)` - `__builtin_ia32_vfmaddbf16(128 \| 256 \| 512)` All the aforementioned `__builtin_ia32_vfmadd` intrinsics are equivalent to a `__builtin_elementwise_fma`, so keeping them is an unnecessary indirection. Fixes [#152461](https://github.com/llvm/llvm-project/issues/152461) --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-08 20:51:15 +01:00
Hood Chatham	b9c328480c	[clang][WebAssembly] Support reftypes & varargs in test_function_pointer_signature (#150921 ) I fixed support for varargs functions (previously it didn't crash but the codegen was incorrect). I added tests for structs and unions which already work. With the multivalue abi they crash in the backend, so I added a sema check that rejects structs and unions for that abi. It will also crash in the backend if passed an int128 or float128 type.	2025-08-07 13:07:04 -07:00
Stanislav Mekhanoshin	34aed0ed56	[AMDGPU] Add gfx1250 wmma_scale[16]_f32_32x16x128_f4 instructions (#152194 )	2025-08-05 15:15:21 -07:00
zGoldthorpe	d7074b63ed	[Clang][AMDGPU] Add builtins for some buffer resource atomics (#149216 ) This patch exposes builtins for atomic `add`, `max`, and `min` operations that operate over buffer resource pointers.	2025-08-05 11:04:15 -06:00
Stanislav Mekhanoshin	a153e83e41	[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036 )	2025-08-04 19:16:34 -07:00
Benjamin Maxwell	af44d87e0d	[clang][SME] Remove folding of `__arm_in_streaming_mode()` (NFC) (#150917 ) This is handled by the instcombine added in #147930; there is no need for any clang-specific folding. NFC as all clang tests for `__arm_in_streaming_mode()` used -O1, which applies the LLVM instcombines.	2025-07-29 10:42:45 +01:00
Hood Chatham	15b03687ff	[WebAssembly,clang] Add __builtin_wasm_test_function_pointer_signature (#150201 ) Tests if the runtime type of the function pointer matches the static type. If this returns false, calling the function pointer will trap. Uses `@llvm.wasm.ref.test.func` added in #147486. Also adds a "gc" wasm feature to gate the use of the ref.test instruction.	2025-07-25 16:52:39 -07:00
Changpeng Fang	d7a38a94cd	[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540 )	2025-07-24 16:23:33 -07:00
Pierre van Houtryve	cd1b84caa8	[NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (#148627 ) "amdgpu-as" is way too vague and doesn't give enough context. We may want to support it on normal atomics too, to control the synchronized (ordered) AS. If we do that, the name has to be less vague.	2025-07-24 12:41:57 +02:00
Alexandros Lamprineas	3ab64c5b29	[NFC][Clang][FMV] Make FMV priority data type future proof. (#150079 ) FMV priority is the returned value of a polymorphic function. On RISC-V and X86 targets a 32-bit value is enough. On AArch64 we currently need 64 bits and we will soon exceed that. APInt seems to be a suitable replacement for uint64_t, presumably with minimal compile time overhead. It allows bit manipulation, comparison and variable bit width.	2025-07-23 10:37:29 +01:00
Changpeng Fang	d6094370cb	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-21 10:09:42 -07:00
Shilei Tian	602d43cfd1	[Clang][AMDGPU] Add the missing builtin `__builtin_amdgcn_sqrt_bf16` (#149447 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 08:43:08 -04:00
Shilei Tian	aecd44818a	[AMDGPU] Add support for `v_tanh_f16` on gfx1250 (#149439 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-18 00:21:04 -04:00
Shilei Tian	7e105fbdbe	[AMDGPU] Add support for `v_tanh_f32` on gfx1250 (#149360 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 15:42:35 -04:00
Shilei Tian	fd5fc76c91	[AMDGPU] Add support for `v_cos_bf16` on gfx1250 (#149355 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 14:43:34 -04:00
Shilei Tian	a102342990	[AMDGPU] Add support for `v_sin_bf16` on gfx1250 (#149241 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 08:49:45 -04:00
Shilei Tian	a6b5ece75e	[AMDGPU] Add support for `v_exp_bf16` on gfx1250 (#149229 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-17 08:46:01 -04:00
Shilei Tian	ad6d5d2821	[AMDGPU] Add support for `v_log_bf16` on gfx1250 (#149201 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-16 19:09:34 -04:00
Shilei Tian	7d2a58e87d	[AMDGPU] Add support for `v_rsq_bf16` on gfx1250 (#149194 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-16 19:06:03 -04:00
raoanag	056f0a10b3	[HLSL][DXIL] Implement `refract` intrinsic (#147342 ) - [x] Implement refract using HLSL source in hlsl_intrinsics.h - [x] Implement the refract SPIR-V target built-in in clang/include/clang/Basic/BuiltinsSPIRV.td - [x] Add sema checks for refract to CheckSPIRVBuiltinFunctionCall in clang/lib/Sema/SemaSPIRV.cpp - [x] Add codegen for spv refract to EmitSPIRVBuiltinExpr in CGBuiltin.cpp - [x] Add codegen tests to clang/test/CodeGenHLSL/builtins/refract.hlsl - [x] Add spv codegen test to clang/test/CodeGenSPIRV/Builtins/refract.c - [x] Add sema tests to clang/test/SemaHLSL/BuiltIns/refract-errors.hlsl - [x] Add spv sema tests to clang/test/SemaSPIRV/BuiltIns/refract-errors.c - [x] Create the int_spv_refract intrinsic in IntrinsicsSPIRV.td - [x] In SPIRVInstructionSelector.cpp create the refract lowering and map it to int_spv_refract in SPIRVInstructionSelector::selectIntrinsic. - [x] Create SPIR-V backend test case in llvm/test/CodeGen/SPIRV/hlsl-intrinsics/refract.ll - [x] Check for what OpenCL support is needed. Resolves https://github.com/llvm/llvm-project/issues/99153	2025-07-16 11:28:55 -07:00
Jim Lin	3e4153c97b	[RISCV] Implement Builtins for XAndesBFHCvt extension. (#148804 ) XAndesBFHCvt provides two builtins functions for converting between float and bf16. Users can use them to convert bf16 values loaded from memory to float, perform arithmetic operations, then convert them back to bf16 and store them to memory. The load/store and move operations for bf16 will be handled in a later patch.	2025-07-16 16:13:31 +08:00
Changpeng Fang	c962f2b29d	AMDGPU: Implement builtins for gfx1250 wmma instructions (#148991 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 18:17:12 -07:00
Shilei Tian	dabc8e2ec1	[AMDGPU] Add support for `v_rcp_bf16` on gfx1250 (#148916 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-15 16:12:51 -04:00
RolandF77	d9e21a92a7	[PowerPC] Add DMF basic builtins (#145372 ) Add support for PPC Dense Math basic builtins dmsetdmrz, dmmr, dmxor.	2025-07-15 13:46:07 -04:00
Shilei Tian	d7ec80c897	[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-14 16:30:18 -04:00
Victor Lomuller	27c9b55659	[SPIRV] Add more id and range builtIns (#143909 ) The patch adds intrinsics and lowering logic for GlobalSize, GlobalOffset, SubgroupMaxSize, NumWorkgroups, WorkgroupSize, WorkgroupId, LocalInvocationId, GlobalInvocationId, SubgroupSize, NumSubgroups, SubgroupId and SubgroupLocalInvocationId SPIR-V builtins. The patch also extend spv_thread_id, spv_group_id and spv_thread_id_in_group to return anyint rather than i32. This allows the intrinsics to support the opencl environment. For each of the intrinsics, new clang builtins were added as well as a binding for the SPIR-V "friendly" format. The original format doesn't define such binding (uses global variables) but it is not possible to express the Input SC which is normally required by the environement specs, and using builtin functions is the most usual approach for other backend and programming models.	2025-07-09 13:52:06 +01:00
David Green	9fcea2e465	[ARM] Add neon vector support for roundeven As per #142559, this marks froundeven as legal for Neon and upgrades the existing arm.neon.vrintn intrinsics.	2025-07-04 15:27:33 +01:00
David Green	ec35065789	[ARM] Add neon vector support for rint As per #142559, this marks frint as legal for Neon and upgrades the existing arm.neon.vrintx intrinsics.	2025-07-03 21:27:48 +01:00
David Green	1f8f477bd0	[ARM] Add neon vector support for trunc As per #142559, this marks ftrunc as legal for Neon and upgrades the existing arm.neon.vrintz intrinsics.	2025-07-03 07:41:13 +01:00
Adam Glass	ed27f18e32	__sys builtin support for AArch64 (#146456 ) Adds support for __sys Clang builtin for AArch64 __sys is a long existing MSVC intrinsic used to manage caches, tlbs, etc by writing to system registers: * It takes a macro-generated constant and uses it to form the AArch64 SYS instruction which is MSR with op0=1. The macro drops op0 and expects the implementation to hardcode it to 1 in the encoding. * Volume use is in systems code (kernels, hypervisors, boot environments, firmware) * Has an unused return value due to MSVC cut/paste error Implementation: * Clang builtin, sharing code with Read/WriteStatusReg * Hardcodes the op0=1 * Explicitly returns 0 * Code-format change from clang-format * Unittests included * Not limited to MSVC-environment as its generally useful and neutral	2025-07-02 10:17:01 -07:00
David Green	5332534b9c	[ARM] Add neon vector support for ceil As per #142559, this marks fceil as legal for Neon and upgrades the existing arm.neon.vrintp intrinsics.	2025-07-01 15:41:10 +01:00
David Green	6bd9ff04af	[ARM] Add neon vector support for round As per #142559, this marks fround as legal for Neon and upgrades the existing arm.neon.vrinta intrinsics.	2025-06-30 17:15:26 +01:00
Changpeng Fang	1f5f381920	AMDGPU: Implement intrinsic/builtins for gfx1250 load transpose instructions (#146289 )	2025-06-29 14:33:31 -07:00
David Green	dcc9e36b18	[ARM] Add neon vector support for floor (#142559 ) This marks ffloor as legal providing that armv8 and neon is present (or fullfp16 for the fp16 instructions). The existing arm_neon_vrintm intrinsics are auto-upgraded to llvm.floor. If this is OK I will update the other vrint intrinsics.	2025-06-29 11:37:16 +01:00
Adam Glass	d9a7b16479	InterlockedAdd_, InterlockedAdd64_ support for AArch64 (#145607 ) This PR adds support for InterlockedAdd_{acq, nf, rel}, and InterlockedAdd64_{acq, nf, rel} for Aarch64.	2025-06-25 12:09:30 -07:00
Reid Kleckner	948cc91188	Reapply "[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099 )" (#138360 ) This reverts commit 83ff9d4a34b1e579dd809759d13b70b8837f0cde. Don't change the builtin signature of _mm_prefetch this time.	2025-06-24 22:07:07 -06:00
Kazu Hirata	ae372bfca8	[CodeGen] Use range-based for loops (NFC) (#145142 )	2025-06-21 08:20:57 -07:00
Paul Walker	f43aaf90df	[NFC][LLVM] Refactor IRBuilder::Create{VScale,ElementCount,TypeSize}. (#142803 ) CreateVScale took a scaling parameter that had a single use outside of IRBuilder with all other callers having to create a redundant ConstantInt. To work round this some code perferred to use CreateIntrinsic directly. This patch simplifies CreateVScale to return a call to the llvm.vscale() intrinsic and nothing more. As well as simplifying the existing call sites I've also migrated the uses of CreateIntrinsic. Whilst IRBuilder used CreateVScale's scaling parameter as part of the implementations of CreateElementCount and CreateTypeSize, I have follow-on work to switch them to the NUW varaiety and thus they would stop using CreateVScale's scaling as well. To prepare for this I have moved the multiplication and constant folding into the implementations of CreateElementCount and CreateTypeSize. As a final step I have replaced some callers of CreateVScale with CreateElementCount where it's clear from the code they wanted the latter.	2025-06-10 12:35:59 +01:00
Victor Lomuller	c474f8f240	[clang][SPIRV] Add builtin for OpGenericCastToPtrExplicit and its SPIR-V friendly binding (#137805 ) The patch introduce __builtin_spirv_generic_cast_to_ptr_explicit which is lowered to the llvm.spv.generic.cast.to.ptr.explicit intrinsic. The SPIR-V builtins are now split into 3 differents file: BuiltinsSPIRVCore.td, BuiltinsSPIRVVK.td for Vulkan specific builtins, BuiltinsSPIRVCL.td for OpenCL specific builtins and BuiltinsSPIRVCommon.td for common ones. The patch also introduces a new header defining its SPIR-V friendly equivalent (__spirv_GenericCastToPtrExplicit_ToGlobal, __spirv_GenericCastToPtrExplicit_ToLocal and __spirv_GenericCastToPtrExplicit_ToPrivate). The functions are declared as aliases to the new builtin allowing C-like languages to have a definition to rely on as well as gaining proper front-end diagnostics. The motivation for the header is to provide a stable binding for applications or library (such as SYCL) and allows non SPIR-V targets to provide an implementation (via libclc or similar to how it is done for gpuintrin.h).	2025-05-29 15:19:40 +02:00
Alex MacLean	3a84a4e55d	Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (#141143 ) Note: This relands #140615 adding a ".count" suffix to the non-".all" variants. Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.count(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)	2025-05-22 19:38:10 -07:00
Alex Maclean	e72d8b2553	Revert "[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615 )" This reverts commit 735209c0688b10a66c24750422b35d8c2ad01bb5.	2025-05-22 17:28:43 +00:00
Alex MacLean	735209c068	[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615 ) Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync(i32, i32) - llvm.nvvm.barrier.cta.arrive(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)	2025-05-21 08:14:15 -07:00
Jim Lin	d561d595c4	[RISCV] Implement intrinsics for XAndesVPackFPH (#140007 ) This patch implements clang intrinsic support for XAndesVPackFPH. The document for the intrinsics can be found at: https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/intrinsic_funcs.adoc#andes-vector-packed-fp16-extensionxandesvpackfph and with policy variants https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/policy_funcs/intrinsic_funcs.adoc#andes-vector-packed-fp16-extensionxandesvpackfph Co-authored-by: Tony Chuan-Yue Yuan <yuan593@andestech.com>	2025-05-20 13:16:51 +08:00
choikwa	77de8a0c0a	[AMDGPU][clang] provide device implementation for __builtin_logb and … (#129347 ) …__builtin_scalbn Clang generates library calls for __builtin_* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn.	2025-05-19 14:11:31 -04:00
Krzysztof Drewniak	4bdd116b80	[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425 ) This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.	2025-05-19 07:15:04 -07:00
Kiva	af083d09bd	[RISCV] Add `zihintpause` LLVM/Clang intrinsic (#139519 ) This PR adds the missing intrinsic `__builtin_riscv_pause` for the zihintpause extension. Spec: https://five-embeddev.com/riscv-user-isa-manual/Priv-v1.12/zihintpause.html Fixes #129961	2025-05-16 14:20:53 +08:00
Lukacma	6fc0312919	[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics (#128019 ) This patch adds fp8 variants to existing intrinsics, whose operation doesn't depend on arguments being a specific type. It also changes mfloat8 type representation in memory from `i8` to `<1xi8>`	2025-05-15 14:01:41 +01:00

1 2

72 Commits