llvm-project

Author	SHA1	Message	Date
Nikita Popov	b384d6d6cc	[CodeGen] Don't include CGDebugInfo.h in CodeGenFunction.h (NFC) (#134100 ) This is an expensive header, only include it where needed. Move some functions out of line to achieve that. This reduces time to build clang by ~0.5% in terms of instructions retired.	2025-04-03 08:04:19 +02:00
Jonathan Thackray	a1a74c9e80	[NFC][clang] Remove superfluous header files after refactor in #132252 (#132495 ) Remove superfluous header files after refactor in #132252	2025-03-26 14:45:00 +00:00
Jonathan Thackray	7f920e2e5f	[NFC][clang] Split clang/lib/CodeGen/CGBuiltin.cpp into target-specific files (#132252 ) clang/lib/CodeGen/CGBuiltin.cpp is over 1MB long (>23k LoC), and can take minutes to recompile (depending on compiler and host system) when modified, and 5 seconds for clangd to update for every edit. Splitting this file was discussed in this thread: https://discourse.llvm.org/t/splitting-clang-s-cgbuiltin-cpp-over-23k-lines-long-takes-1min-to-compile/ and the idea has received a number of +1 votes, hence this change.	2025-03-21 19:09:39 +00:00
Phoebe Wang	09feaa9261	Revert "[X86][AVX10.2] Support YMM rounding new instructions (#101825 )" (#132362 ) This reverts commit 0dba5381d8c8e4cadc32a067bf2fe5e3486ae53d. YMM rounding was removed from AVX10 whitepaper. Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343 The MINMAX and SATURATING CONVERT instructions will be removed as a follow up.	2025-03-21 20:12:57 +08:00
Rahul Joshi	88a51d2392	[Clang][NFC] Code cleanup in CGBuiltin.cpp (#132060 ) - Use `Intrinsic::` directly instead of `llvm::Intrinsic::`. - Eliminate redundant `nullptr` for some `CreateIntrinsic` calls. - Eliminate redundant `ArrayRef` casts. - Use C++17 structured binding instead of `std::tie`.	2025-03-20 15:46:28 -07:00
Aaron Ballman	d781ac1cf0	[C23] Add __builtin_c23_va_start (#131166 ) This builtin is supported by GCC and is a way to improve diagnostic behavior for va_start in C23 mode. C23 no longer requires a second argument to the va_start macro in support of variadic functions with no leading parameters. However, we still want to diagnose passing more than two arguments, or diagnose when passing something other than the last parameter in the variadic function. This also updates the freestanding <stdarg.h> header to use the new builtin, same as how GCC works. Fixes #124031	2025-03-15 11:01:53 -04:00
Juan Manuel Martinez Caamaño	7decd04626	[Clang] Add __builtin_elementwise_exp10 in the same fashion as exp/exp2 (#130746 ) Clang has __builtin_elementwise_exp and __builtin_elementwise_exp2 intrinsics, but no __builtin_elementwise_exp10. There doesn't seem to be a good reason not to expose the exp10 flavour of this intrinsic too. This commit introduces this intrinsic following the same pattern as the exp and exp2 versions. Fixes: SWDEV-519541	2025-03-12 09:20:29 +01:00
Benjamin Maxwell	fb397ab1e5	Reland "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#130761 ) Reverts `c40f0fe434` Original description: This updates the existing modf[f\|l] builtin to be lowered via the llvm.modf.* intrinsic (rather than directly to a library call). The Windows 32-bit x86 missing `modff` symbol issue should have been solved in: https://github.com/llvm/llvm-project/pull/130636.	2025-03-11 14:55:33 +00:00
Hans Wennborg	c40f0fe434	Revert "Reland "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#129885 )" This broke modff calls on 32-bit x86 Windows. See comment on the PR. > This updates the existing modf[f\|l] builtin to be lowered via the > llvm.modf.* intrinsic (rather than directly to a library call). > > The legalization issues exposed by the original PR (#126750) should have > been fixed in #128055 and #129264. This reverts commit cd1d9a8fab05524a27ffdb251f6def37786b5cc1.	2025-03-10 16:35:03 +01:00
Chris B	e85e29c299	[HLSL] select scalar overloads for vector conditions (#129396 ) This PR adds scalar/vector overloads for vector conditions to the `select` builtin, and updates the sema checking and codegen to allow scalars to extend to vectors. Fixes #126570	2025-03-09 16:01:12 -05:00
Matt Arsenault	bfea84946d	clang: Hack around opencl enqueue_block using wrong ABI for aggregrate (#130011 ) EmitAggExprToLValue started wrapping the temporary alloca in an addrspacecast at some point. We take the direct type from this as the pointer argument for the runtime function type, but this isn't correct. Technically, we should be querying the target's ABI for what IR to produce for this sequence. The assumption seems to always have been that this will be indirectly passed with byval (or byref). I started working on a patch to go through the ABI handling, but it seems to require more time and/or clang expertise than I have at the moment.	2025-03-06 23:13:28 +07:00
Benjamin Maxwell	cd1d9a8fab	Reland "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#129885 ) Reverts llvm/llvm-project#127987 Original description: This updates the existing modf[f\|l] builtin to be lowered via the llvm.modf.* intrinsic (rather than directly to a library call). The legalization issues exposed by the original PR (#126750) should have been fixed in #128055 and #129264.	2025-03-06 09:43:59 +00:00
Deric C.	b4ecebe745	[HLSL] [DXIL] Implement the AddUint64 HLSL function and the UAddc DXIL op (#127137 ) Fixes #99205. - Implements the HLSL intrinsic `AddUint64` used to perform unsigned 64-bit integer addition by using pairs of unsigned 32-bit integers instead of native 64-bit types - The LLVM intrinsic `uadd_with_overflow` is used in the implementation of `AddUint64` in `CGBuiltin.cpp` - The DXIL op `UAddc` was defined in `DXIL.td`, and a lowering of the LLVM intrinsic `uadd_with_overflow` to the `UAddc` DXIL op was implemented in `DXILOpLowering.cpp` Notes: - `__builtin_addc` was not able to be used to implement `AddUint64` in `hlsl_intrinsics.h` because its `CarryOut` argument is a pointer, and pointers are not supported in HLSL - A lowering of the LLVM intrinsic `uadd_with_overflow` to SPIR-V [already exists](https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/SPIRV/llvm-intrinsics/uadd.with.overflow.ll) - When lowering the LLVM intrinsic `uadd_with_overflow` to the `UAddc` DXIL op, the anonymous struct type `{ i32, i1 }` is replaced with a named struct type `%dx.types.i32c`. This aspect of the implementation may be changed when issue #113192 gets addressed - Fixes issues mentioned in the comments on the original PR #125319 --------- Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com> Co-authored-by: Farzon Lotfi <farzonlotfi@microsoft.com> Co-authored-by: Chris B <beanz@abolishcrlf.org> Co-authored-by: Justin Bogner <mail@justinbogner.com>	2025-03-05 17:04:10 -08:00
YunQiang Su	4bd3427321	IRBuilder: Add FMFSource parameter to CreateMaxNum/CreateMinNum (#129173 ) In https://github.com/llvm/llvm-project/pull/112852, we claimed that llvm.minnum and llvm.maxnum should treat +0.0>-0.0, while libc doesn't require fmin(3)/fmax(3) for it. Let's add FMFSource parameter to CreateMaxNum and CreateMinNum, so that they can be used by CodeGenFunction::EmitBuiltinExpr of Clang.	2025-03-04 08:49:35 +08:00
metkarpoonam	23efe734fc	[HLSL] Add "or" intrinsic (#128979 ) Include HLSL or_intrinsic, add codegen in CGBuiltin, and the corresponding tests in or.hlsl. Additionally, incorporate logical-operator-errors to handle both 'and' and 'or' semantic diagnostics.	2025-02-28 13:54:13 -07:00
Virginia Cangelosi	2477f82db9	[clang] Update SVE load and store intrinsics to have FP8 variants (#126726 )	2025-02-28 14:20:59 +00:00
Heejin Ahn	40566fd674	[WebAssembly] Generate invokes with llvm.wasm.(re)throw (#128105 ) Even though `__builtin_wasm_throw`, which is lowered down to `llvm.wasm.throw`, throws, ```cpp try { __builtin_wasm_throw(0, obj); } catch (...) { } ``` does not generate `invoke`. This is because we have assumed the intrinsic cannot be invoked, which doesn't make much sense. (See #128104 for the historical context) #128104 made `llvm.wasm.throw` intrinsic invokable in the backend. This actually generates `invoke`s in Clang for `__builtin_wasm_throw`. While we're at it, this also generates `invoke`s for `__builtin_wasm_rethrow`, which is actually not used anywhere in C++ support. I haven't deleted it just in case in may have uses later. (For example, to support rethrow functionality that carries stack trace with exnref) Depends on #128104 for the CI to pass. Fixes #124710.	2025-02-25 14:12:35 -08:00
Deric Cheung	305d273894	Reland "[HLSL] Implement the reflect HLSL function" (#125599 ) This PR relands #122992. A reland was attempted before (#123853), but it [failed to pass the `sanitizer-aarch64-linux-bootstrap-hwasan` buildbot](https://github.com/llvm/llvm-project/pull/123853#issuecomment-2608389396) due to the test `llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll` The issue has since been patched thanks to @vitalybuka, so the PR is safe to reland without any changes. See https://github.com/llvm/llvm-project/pull/125599#discussion_r1966650839 and https://github.com/llvm/llvm-project/pull/125599#discussion_r1966650839	2025-02-24 16:46:59 -08:00
Benjamin Maxwell	5a7ee431fd	[clang] Add `__builtin_sincospi` that lowers to `llvm.sincospi.` (#127065 ) This (only) adds the `__builtin` variant which lowers to the `llvm.sincospi.` intrinsic when `-fno-math-errno` is set.	2025-02-21 16:49:41 +00:00
Benjamin Maxwell	d595fc91ae	Revert "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#127987 ) Reverts llvm/llvm-project#126750 Revering while I investigate: https://lab.llvm.org/buildbot/#/builders/72/builds/8406	2025-02-20 10:25:18 +00:00
Deric Cheung	1c762c288f	[HLSL] Implement the 'and' HLSL function (#127098 ) Addresses #125604 - Implements `and` as an HLSL builtin function - The `and` HLSL builtin function gets lowered to the the LLVM `and` instruction	2025-02-19 11:22:46 -08:00
Benjamin Maxwell	d804c83893	[clang] Lower modf builtin using `llvm.modf` intrinsic (#126750 ) This updates the existing `modf[f\|l]` builtin to be lowered via the `llvm.modf.*` intrinsic (rather than directly to a library call).	2025-02-19 14:49:22 +00:00
Benjamin Maxwell	7781e1040d	[clang] Lower non-builtin sincos[f\|l] calls to llvm.sincos.* when -fno-math-errno is set (#121763 ) This will allow vectorizing these calls (after a few more patches). This should not change the codegen for targets that enable the use of AA during the codegen (in `TargetSubtargetInfo::useAA()`). This includes targets such as AArch64. This notably does not include x86 but can be worked around by passing `-mllvm -combiner-global-alias-analysis=true` to clang. Follow up to #114086.	2025-02-19 10:13:41 +00:00
Krzysztof Drewniak	f7d03707d1	[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828 ) Attempting to pass a `ptr addrspace(7)` to functions that take `ptr` arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7) to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP operations on buffer resources, which can't be GEP'd. (However, note that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr is legal - it's just an effective address computation) To resolve this problem, and thus prevent illegal `getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this commit extends amdgcn.make.buffer.rsrc to also be variadic in its result type, auto-upgrading old manglings. The logic for handling a make.buffer.rsrc in instruction selection remains untouched and expects the output type to be a ptr addrspace(8), as does the Clang lowering for its builtin (the pointer-to-pointer version might want a different name in clang). LowerBufferFatPointers has been updated to lower amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* . This'll also make exposing buffer fat pointers in Clang easier, since you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.	2025-02-18 14:15:28 -06:00
Florian Hahn	50d10b5c1c	[Clang] Add __builtin_assume_dereferenceable to encode deref assumption. (#121789 ) This patch adds a new __builtin_assume_dereferenceable to encode dereferenceability of a pointer using llvm.assume with an operand bundle. For now the builtin only accepts constant sizes, I am planning to drop this restriction in a follow-up change. This can be used to better optimize cases where a pointer is known to be dereferenceable, e.g. unconditionally loading from p2 when vectorizing the loop. int get_ptr(); void foo(int src, int x) { int *p2 = get_ptr(); __builtin_assume_aligned(p2, 4); __builtin_assume_dereferenceable(p2, 4000); for (unsigned I = 0; I != 1000; ++I) { int x = src[I]; if (x == 0) x = p2[I]; src[I] = x; } } PR: https://github.com/llvm/llvm-project/pull/121789	2025-02-14 12:44:20 +01:00
Ulrich Weigand	adacbf68eb	[SystemZ] Add codegen support for llvm.roundeven This is straightforward as we already had all the necessary instructions, they simply were not wired up. Also allows implementing the vec_round intrinsic via the standard llvm.roundeven IR instead of a platform intrinsic now.	2025-02-14 00:10:37 +01:00
Bill Wendling	2eb44aa0a9	[Clang][counted-by] Bail out of visitor for LValueToRValue cast (#125571 ) An LValueToRValue cast shouldn't be ignored, so bail out of the visitor if we encounter one.	2025-02-04 11:00:44 -08:00
Chandler Carruth	cd269fee05	[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.	2025-02-04 18:04:57 +00:00
Hans Wennborg	83ff9d4a34	Revert "[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099 )" This broke the build, see buildbot comments on the PR. This reverts commit ee92122b53c7af26bb766e89e1d30ceb2fd5bb93 and follow-up 5dccfd9283cd784758aa3d16fcb6e31f135c080f.	2025-02-04 11:19:20 +01:00
Reid Kleckner	ee92122b53	[Win/X86] Make _m_prefetch[w] builtins to avoid winnt.h conflicts (#115099 ) This is similar in spirit to previous changes to make _mm_mfence builtins to avoid conflicts with winnt.h and other MSVC ecosystem headers that pre-declare compiler intrinsics as extern "C" symbols. Also update the feature flag for _mm_prefetch to sse, which is more accurate than mmx. This should fix issue #87515.	2025-02-03 14:05:58 -08:00
Bill Wendling	cff0a460ae	[Clang][counted_by] Refactor __builtin_dynamic_object_size on FAMs (#122198 ) Refactoring of how __builtin_dynamic_object_size() is calculated for flexible array members (in preparation for adding support for the 'counted_by' attribute on pointers in structs). The only functionality change is that we use the already emitted Expr code to build our calculations off of rather than re-emitting the Expr. That allows the 'StructFieldAccess' visitor to sift through all casts and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs and calculate offsets based off of relative distances from that MemberExpr. The testcase passes execution tests. Calculate the flexible array member's object size using these formulae (note: if the calculation is negative, we return 0.): struct p; struct s { /* ... / int count; struct p array[] __attribute__((counted_by(count))); }; 1) 'ptr->array': count = ptr->count; flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; if (flexible_array_member_size < 0) return 0; return flexible_array_member_size; 2) '&ptr->array[idx]': count = ptr->count; index = idx; flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; index_size = index * flexible_array_member_base_size; if (flexible_array_member_size < 0 \|\| index < 0) return 0; return flexible_array_member_size - index_size; 3) '&ptr->field': count = ptr->count; sizeof_struct = sizeof (struct s); flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; field_offset = offsetof (struct s, field); offset_diff = sizeof_struct - field_offset; if (flexible_array_member_size < 0) return 0; return offset_diff + flexible_array_member_size; 4) '&ptr->field_array[idx]': count = ptr->count; index = idx; sizeof_struct = sizeof (struct s); flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; field_base_size = sizeof (ptr->field_array); field_offset = offsetof (struct s, field) field_offset += index field_base_size; offset_diff = sizeof_struct - field_offset; if (flexible_array_member_size < 0 \|\| index < 0) return 0; return offset_diff + flexible_array_member_size; --------- Signed-off-by: Bill Wendling <morbo@google.com>	2025-01-30 15:36:13 -08:00
Nikita Popov	1295aa2e81	[Clang] Add -fwrapv-pointer flag (#122486 ) GCC supports three flags related to overflow behavior: * `-fwrapv`: Makes signed integer overflow well-defined. * `-fwrapv-pointer`: Makes pointer overflow well-defined. * `-fno-strict-overflow`: Implies `-fwrapv -fwrapv-pointer`, making both signed integer overflow and pointer overflow well-defined. Clang currently only supports `-fno-strict-overflow` and `-fwrapv`, but not `-fwrapv-pointer`. This PR proposes to introduce `-fwrapv-pointer` and adjust the semantics of `-fwrapv` to match GCC. This allows signed integer overflow and pointer overflow to be controlled independently, while `-fno-strict-overflow` still exists to control both at the same time (and that option is consistent across GCC and Clang).	2025-01-28 09:57:00 +01:00
Adam Yang	aab25f20f6	[HLSL][SPIRV][DXIL] Implement `WaveActiveMax` intrinsic (#123428 ) ``` - add clang builtin to Builtins.td - link builtin in hlsl_intrinsics - add codegen for spirv intrinsic and two directx intrinsics to retain signedness information of the operands in CGBuiltin.cpp - add semantic analysis in SemaHLSL.cpp - add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp - add lowering of directx intrinsics to WaveActiveOp dxil op in DXIL.td - add test cases to illustrate passespendent pr merges. ``` Resolves #99170	2025-01-27 23:26:56 -08:00
Momchil Velikov	f75860f895	[AArch64] Implement NEON FP8 intrinsics for fused multiply-add (#123615 ) This patch adds the following intrinsics: * Fused multiply-add non-indexed float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) * Floating-point multiply-add long to half-precision (vector, by element) float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) * Floating-point multiply-add long-long to single-precision (vector, by element) float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-28 00:38:44 +00:00
Momchil Velikov	804b81d39f	[AArch64] Add FP8 Neon intrinsics for dot-product (#123613 ) This patch adds the following intrinsics: float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-27 21:14:16 +00:00
Momchil Velikov	99bd2e3f12	[AArch64] Add Neon FP8 conversion intrinsics (#123612 ) The patch adds the following intrinsics: bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm) mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t fpm) Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>	2025-01-27 17:32:47 +00:00
Momchil Velikov	87103a016f	[AArch64] Implement NEON FP8 vectors as VectorType (#123603 ) Reimplement Neon FP8 vector types using attribute `neon_vector_type` instead of having them as builtin types. This allows to implement FP8 Neon intrinsics without the need to add special cases for these types when using `__builtin_shufflevector` or bitcast (using C-style cast operator) between vectors, both extensively used in the generated code in `arm_neon.h`.	2025-01-27 10:41:53 +00:00
Phoebe Wang	ee2722fc88	[X86][AVX10.2-BF16] Remove [NE]P from intrinsic and instruction name (#123335 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2025-01-24 15:49:28 +08:00
Finn Plummer	0fe8e70c66	Revert "Reland "[HLSL] Implement the `reflect` HLSL function"" (#124046 ) Reverts llvm/llvm-project#123853 The introduction of `reflect-error.ll` surfaced a bug with the use of `report_fatal_error` in `SPIRVInstructionSelector` that was propagated into the pr. This has caused a build-bot breakage, and the work to solve the underlying issue is tracked here: https://github.com/llvm/llvm-project/issues/124045. We can re-apply this commit when the underlying issue is resolved.	2025-01-22 18:22:03 -08:00
Deric Cheung	2656928d0c	Reland "[HLSL] Implement the `reflect` HLSL function" (#123853 ) This PR relands [#122992](https://github.com/llvm/llvm-project/pull/122992). Some machines were failing to run the `reflect-error.ll` test due to the RUN lines ```llvm ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ``` which failed when `spirv-tools` was not present on the machine due to running the command `not` without any arguments. These RUN lines have been removed since they don't actually test anything new compared to the other two RUN lines due to the expected error during instruction selection. ```llvm ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 \| FileCheck %s ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 \| FileCheck %s ```	2025-01-22 13:29:19 -08:00
Oliver Stannard	c4ef805b0b	[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#121943 ) Re-write the sema and codegen for the atomic_test_and_set and atomic_clear builtin functions to go via AtomicExpr, like the other atomic builtins do. This simplifies the code, because AtomicExpr already handles things like generating code for to dynamically select the memory ordering, which was duplicated for these builtins. This also fixes a few crash bugs, one when passing an integer to the pointer argument, and one when using an array. This also adds diagnostics for the memory orderings which are not valid for atomic_clear according to https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which were missing before. Fixes https://github.com/llvm/llvm-project/issues/111293. This is a re-land of #120449, modified to allow any non-const pointer type for the first argument.	2025-01-22 10:48:04 +00:00
schittir	65df99c208	[NFC] Avoid potential nullptr deref by using castAs<> (#123395 ) Use castAs<> instead of getAs<>	2025-01-21 21:41:37 -08:00
Finn Plummer	4c91263045	Revert "[HLSL] Implement the `reflect` HLSL function" (#123846 ) Reverts llvm/llvm-project#122992 Due to an included failing test-case the commit causes build failures.	2025-01-21 15:12:58 -08:00
Deric Cheung	dd860bcfb5	[HLSL] Implement the `reflect` HLSL function (#122992 ) Fixes #99152 Tasks completed: - Implement `reflect` in `clang/lib/Headers/hlsl/hlsl_intrinsics.h` - Implement the `reflect` SPIR-V target built-in in `clang/include/clang/Basic/BuiltinsSPIRV.td` - Add a SPIR-V fast path in `clang/lib/Headers/hlsl/hlsl_detail.h` in the form ```c++ #if (__has_builtin(__builtin_spirv_reflect)) return __builtin_spirv_reflect(...); #else return ...; // regular behavior #endif ``` - Add codegen for the SPIR-V `reflect` built-in to `EmitSPIRVBuiltinExpr` in `clang/lib/CodeGen/CGBuiltin.cpp` - Add HLSL codegen tests to `clang/test/CodeGenHLSL/builtins/reflect.hlsl` - Add SPIR-V built-in codegen tests to `clang/test/CodeGenSPIRV/Builtins/reflect.c` - Add sema tests to `clang/test/SemaHLSL/BuiltIns/reflect-errors.hlsl` - Add SPIR-V sema tests to `clang/test/CodeGenSPIRV/Builtins/reflect-errors.c` - Create the `int_spv_reflect` intrinsic in `llvm/include/llvm/IR/IntrinsicsSPIRV.td` - In `llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp` create the `reflect` lowering and map it to `int_spv_reflect` in `SPIRVInstructionSelector::selectIntrinsic` - Create a SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/reflect.ll` Additional tasks completed: - Implement sema check for the `reflect` SPIR-V built-in in `clang/lib/Sema/SemaSPIRV.cpp` - Required for HLSL codegen to work via the SPIR-V fast path, because the types defined in `clang/include/clang/Basic/BuiltinsSPIRV.td` are being overridden - Create SPIR-V backend error test case in `llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll` - Since `reflect` is only available in the GLSL extended instruction set, using it in OpenCL should result in an error Incomplete tasks: - Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/opencl/reflect.ll` - An OpenCL test is not applicable in this case because the [OpenCL SPIR-V extended instruction set](https://registry.khronos.org/SPIR-V/specs/unified1/OpenCL.ExtendedInstructionSet.100.html) does not include a `reflect` function	2025-01-21 14:30:29 -08:00
David Green	6dc356d698	[Clang] Add numeric for iota. Hopefuly fixes MSVC build after 547bfda56b2e3f3a4c6d2357d3566dcd3fa996ad.	2025-01-21 10:36:58 +00:00
David Green	547bfda56b	[AArch64] Improve bcvtn2 and remove aarch64_neon_bfcvt intrinsics (#120363 ) This started out as trying to combine bf16 fpround to BFCVT2 instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in favour of generating fpround instructions directly. This simplifies the patterns and can lead to other optimizations. The BFCVT2 instruction is adjusted to makes sure the types are valid, and a bfcvt2 is now generated in more place. The old intrinsics are auto-upgraded to fptrunc instructions too.	2025-01-21 09:16:04 +00:00
Ulrich Weigand	8424bf207e	[SystemZ] Add support for new cpu architecture - arch15 This patch adds support for the next-generation arch15 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Detection of arch15 as host processor. - Assembler/disassembler support for new instructions. - Exploitation of new instructions for code generation. - New vector (signed\|unsigned\|bool) __int128 data types. - New LLVM intrinsics for certain new instructions. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10305. Note: No currently available Z system supports the arch15 architecture. Once new systems become available, the official system name will be added as supported -march name.	2025-01-20 19:30:21 +01:00
Farzon Lotfi	eddeb36cf1	[SPIRV] add pre legalization instruction combine (#122839 ) - Add the boilerplate to support instcombine in SPIRV - instcombine length(X-Y) to distance(X,Y) - switch HLSL's distance intrinsic to not special case for SPIRV. - fixes #122766 - This RFC we were requested to add in the infra for pattern matching: https://discourse.llvm.org/t/rfc-add-targetbuiltins-for-spirv-to-support-hlsl/83329/13	2025-01-17 14:46:14 -05:00
Adam Yang	4446a9849a	[HLSL][SPIRV][DXIL] Implement `WaveActiveSum` intrinsic (#118580 ) ``` - add clang builtin to Builtins.td - link builtin in hlsl_intrinsics - add codegen for spirv intrinsic and two directx intrinsics to retain signedness information of the operands in CGBuiltin.cpp - add semantic analysis in SemaHLSL.cpp - add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp - add lowering of directx intrinsics to WaveActiveOp dxil op in DXIL.td - add test cases to illustrate passespendent pr merges. ``` Resolves #70106 --------- Co-authored-by: Finn Plummer <canadienfinn@gmail.com>	2025-01-16 10:35:23 -08:00
Ashley Coleman	4f48abff0f	[HLSL] Implement elementwise firstbitlow builtin (#116858 ) Closes https://github.com/llvm/llvm-project/issues/99116 Implements `firstbitlow` by extracting common functionality from `firstbithigh` into a shared function while also fixing a bug for an edge case where `u64x3` and larger vectors will attempt to create vectors larger than the SPRIV max of 4. --------- Co-authored-by: Steven Perron <stevenperron@google.com>	2025-01-15 15:36:50 -07:00

1 2 3 4 5 ...

2139 Commits