llvm-project

Author	SHA1	Message	Date
neonetizen	e11a31f4c7	[CIR][AArch64] Lower FP16 vduph lane intrinsics (#186955 ) From #185382 Lower `vduph_lane_f16` and `vduph_laneq_f16` to `cir::VecExtractOp` Tests moved from `v8.2a-neon-instrinsics-generic.c` to a new CIR-enabled test file. I tried following from notes made in #185852 (BF16)	2026-04-06 19:12:34 +01:00
Andrzej Warzyński	38c53b3eb9	[clang][cir][nfc] Fix comments, add missing EOF (#190623 )	2026-04-06 18:06:57 +01:00
albertbolt1	8d7823ea8f	[CIR][AArch64] Added vector intrinsics for shift left (#187516 ) Added vector intrinsics for vshlq_n_s8 vshlq_n_s16 vshlq_n_s32 vshlq_n_s64 vshlq_n_u8 vshlq_n_u16 vshlq_n_u32 vshlq_n_u64 vshl_n_s8 vshl_n_s16 vshl_n_s32 vshl_n_s64 vshl_n_u8 vshl_n_u16 vshl_n_u32 vshl_n_u64 these cover all the vector intrinsics for constant shift the method followed 1) the vectors for quad words are of the form `64x2`, `32x4`, `16x8`, `8x16` and the shift is a constant value but for shift left we need both of them to be vectors so we take the constant shift and convert it into a vector of respective form, for `64x2` we convert the constant to `64x2`, I have learnt that this process is also called splat 2) After splat we have that the lhs and rhs are of the same size hence the shift left can be applied 3) There is one issue though, the ops[0] is not of the right size, for quad words it falls back to the default int8*16 in the function, so I am converting it to the required size using bit casting, `8x16` = `64x2` so we can bitcast and get the vector array in the right form. Wrote the test cases for all the intrinsics listed above #185382	2026-04-06 17:00:38 +01:00
Eli Friedman	9471fabf8a	[clang] Fix issues with const/pure on varargs function. (#190252 ) There are two related issues here. On the declaration/definition side, we need to make sure the markings are conservative. Then on the caller side, we need to make sure we don't access parameters that don't exist. Fixes #187535.	2026-04-03 13:57:35 -07:00
Florian Hahn	6476619f30	[Matrix] Use matrix element type for TBAA nodes. (#190029 ) Matrix loads and stores are accesses of their element types. Emit TBAA nodes using their element type to allow more precise TBAA alias analysis. PR: https://github.com/llvm/llvm-project/pull/190029	2026-04-03 20:11:04 +00:00
Amr Hesham	2108252f0e	[clang] Fixed a crash when explicitly casting to atomic complex (#172163 ) Fixed a crash when explicitly casting a scalar to an atomic complex. resolve: #114885	2026-04-03 19:28:20 +02:00
Amr Hesham	f2dff15995	[clang] Fixed a crash when explicitly casting between atomic complex types (#172210 ) Fixed a crash when explicitly casting between atomic complex types resolve: #172208	2026-04-02 22:55:43 +02:00
Justin Stitt	43233b8aae	[Clang] Add missing __ob_trap check for sign change (#188340 ) Add a missing OBTrapInvolved check before EmitIntegerSignChangeCheck(). This is considered "missing" as a previous attempt (https://github.com/llvm/llvm-project/pull/185772) to properly add an `__ob_trap` backdoor missed this particular instance. This backdoor is needed because we want `__ob_trap` types to be picky about implicit conversions (including implicit sign change): ```c unsigned int __ob_trap big = 4294967295; (signed int)big; // should trap! ``` Move the `OBTrapInvolved` setup to the top of the function so it can be used in all the places we need it.	2026-04-02 10:51:46 -07:00
Tony Guillot	42b6a6faaa	[Clang] Fixed the behavior of C23 auto when an array type was specified for a `char ` (#189722 ) At the time of the implementation of [N3007](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3007.htm) in Clang, when an array type was specified, an error was emitted unless the deduced type was a `char `. After further inspection in the C standard, it turns out that the inferred type of an `char[]` should be deduced to a `char *`, which should emit an error if an array type is specified with `auto`. This now invalidates the following cases: ```c auto s1[] = "test"; auto s2[4] = "test"; auto s3[5] = "test"; ``` Fixes #162694	2026-04-02 18:40:21 +02:00
Trung Nguyen	9dc1da6e87	[clang] Add support for MSVC force inline attrs (#185282 ) Add support for `[[msvc::forceinline]]` and `[[msvc::forceinline_calls]]`. `[[msvc::forceinline]]` is equivalent to Microsoft's `__forceinline` when placed before a function declaration. Unlike `__forceinline`, `[[msvc::forceinline]]` works with lambdas. `[[msvc::forceinline_calls]]` is simliar to `[[clang::always_inline]]` but only works on statements. Both are implemented as aliases of `[[clang::always_inline]]` with special checks. Fixes #186539.	2026-04-02 16:42:26 +02:00
wanglei	76fc936175	[Clang][LoongArch] Align LSX/LASX built-in signatures with intrinsic types to avoid lax conversions (#189900 ) Update the built-in signatures in BuiltinsLoongArchLSX.def and BuiltinsLoongArchLASX.def to precisely match the vector types used in the corresponding intrinsic headers (lsxintrin.h and lasxintrin.h). This alignment ensures that these intrinsics can be compiled successfully even when -flax-vector-conversions=none is specified, since the built-in arguments no longer rely on implicit vector type conversions. Added new test cases to verify the macro-defined LSX/LASX intrinsic interfaces under -flax-vector-conversions=none. Fixes #189898	2026-04-02 16:11:22 +08:00
Florian Hahn	b46f8fa622	[Matrix] Add tests checking TBAA emission for matrix types (NFC). (#189953 ) PR: https://github.com/llvm/llvm-project/pull/189953	2026-04-01 13:31:09 +00:00
VASU SHARMA	2313989499	[UBSAN] [NFC] pre-commit tests for null, alignment, bounds checks (#176210 ) PR to add precommit tests to document current UBSAN behavior for aggregate copy operations. The test covers: - Sanitizers: null, alignment, bounds - Type variants: plain, _Atomic (C), volatile (C) - Operand forms: arr[idx], *ptr - Operations: assignment, initialization, initializer list, variadic args, nested member access - C++ specific: direct/brace/copy-list init, new expressions, casts, operator=, virtual base init - Bounds checking: in-bounds access, past-the-end access (index == size), beyond bounds access, dynamic index --------- Co-authored-by: vasu-ibm <Vasu.Sharma2@ibm.com> Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>	2026-04-01 14:04:28 +05:30
Alexis Engelke	74e84c0cf5	[Clang] Fix getTerminator() use for -fasync-exceptions (#189644 )	2026-03-31 12:50:25 +00:00
Alex Voicu	18e6958903	[SPIRV][AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (#134016 ) This change adds two builtins for AMDGPU: - `__builtin_amdgcn_processor_is`, which is similar in observable behaviour with `__builtin_cpu_is`, except that it is never "evaluated" at run time; - `__builtin_amdgcn_is_invocable`, which is behaviourally similar with `__has_builtin`, except that it is not a macro (i.e. not evaluated at preprocessing time). Neither of these are `constexpr`, even though when compiling for concrete (i.e. `gfxXXX` / `gfxXXX-generic`) targets they get evaluated in Clang, so they shouldn't tear the AST too badly / at all for multi-pass compilation cases like HIP. They can only be used in specific contexts (as args to control structures). The motivation for adding these is two-fold: - as a nice to have, it provides an AST-visible way to incorporate architecture specific code, rather than having to rely on macros and the preprocessor, which burn in the choice quite early; - as a must have, it allows featureful AMDGCN flavoured SPIR-V to be produced, where target specific capability is guarded and chosen or discarded when finalising compilation for a concrete target; this is built atop the Speciali\ation Constant concept which is described in the SPIR-V specification under section [2.12 Specialization](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_specialization_2) I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome). --------- Co-authored-by: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com> Co-authored-by: Voicu <avoicu@amd.com>	2026-03-30 23:02:26 +01:00
Jean-Michel Gorius	65cb5c3975	[clang][x86] Fix the return type of the cvtpd2dq builtin (#189254 ) The CVTPD2DQ instruction converts packed 64-bit floating-point values to packed 32-bit signed integer values. This patch fixes the return type of the corresponding builtin, which previously returned a vector of two 64-bit signed integers. The new behavior is in line with the return type of the CVTTPD2DQ builtin.	2026-03-30 10:38:21 +00:00
Owen Anderson	3f2e24726a	[CHERI] Allow @llvm.clear_cache to accept pointers in address spaces other than 0. (#189283 ) Co-Authored-by: Jessica Clarke <jrtc27@jrtc27.com>	2026-03-30 09:20:49 +02:00
Kamran Yousafzai	1264ffc4cc	[clang][RISC-V] fixed fp calling convention for fpcc eligible structs for risc-v (#110690 ) The code generated for calls with FPCC eligible structs as arguments doesn't consider the bitfield, which results in a store crossing the boundary of the memory allocated using alloca, e.g. For the code: ``` struct __attribute__((packed, aligned(1))) S { const float f0; unsigned f1 : 1; }; unsigned func(struct S arg) { return arg.f1; } ``` The generated IR is: ``` define dso_local signext i32 @func( float [[TMP0:%.]], i32 [[TMP1:%.]]) #[[ATTR0:[0-9]+]] { [[ENTRY:.:]] [[ARG:%.]] = alloca [[STRUCT_S:%.]], align 1 [[TMP2:%.]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 0 store float [[TMP0]], ptr [[TMP2]], align 1 [[TMP3:%.]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 1 store i32 [[TMP1]], ptr [[TMP3]], align 1 [[F1:%.]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[ARG]], i32 0, i32 1 [[BF_LOAD:%.]] = load i8, ptr [[F1]], align 1 [[BF_CLEAR:%.]] = and i8 [[BF_LOAD]], 1 [[BF_CAST:%.]] = zext i8 [[BF_CLEAR]] to i32 ret i32 [[BF_CAST]] ``` Where, `store i32 [[TMP1]], ptr [[TMP3]], align 1` can be seen crossing the boundary of the allocated memory. If, the IR is seen after optimizations (EarlyCSEPass), the IR left is: ``` define dso_local noundef signext i32 @func( float [[TMP0:%.]], i32 [[TMP1:%.]]) local_unnamed_addr #[[ATTR0:[0-9]+]] { [[ENTRY:.:]] ret i32 0 ``` The patch trims the second member of the struct after taking into consideration the bitwidth to decide the appropriate integer type and the test shows the results of this patch. Note that the bug is seen only when `f` extension is enabled for FPCC eligibility. Co-authored-by: muhammad.kamran4 <muhammad.kamran@esperantotech.com>	2026-03-27 16:55:11 -07:00
Pau Sum	0f81923735	[CIR][AArch64] Upstream vmull_/vmull_high_ and vmul_p8/vmul_high_p8 Neon builtins (#188371 ) Add CIR generation for AArch64 NEON builtins `vmull_` and `vmull_high_` The accompanying tests from [AArch64/neon-instrinsics](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon-intrinsics.c) were integrated with new checks for CIR codegen. Part of #185382	2026-03-27 21:18:24 +00:00
Justin Stitt	8b395a7755	[Clang] Ensure pattern exclusion priority over OBT (#188390 ) Make sure pattern exclusions have priority over the overflow behavior types when deciding whether or not to emit truncation checks. Accomplish this by carrying an extra field through `ScalarConversionOpts` which we later check before emitting instrumentation.	2026-03-27 13:51:21 -07:00
Xinlong Chen	26f26400d9	[CIR][AArch64] Upstream NEON Maximum builtins (#188503 ) Implement CIR codegen for `vmax_`, `vmaxq_`, `maxnm_`, `vmaxnmq_` AArch64 NEON builtins. part of https://github.com/llvm/llvm-project/issues/185382	2026-03-27 10:54:13 +00:00
Jiahao Guo	013cf4fd1f	[CIR][AArch64] Support BF16 Neon types and lower vdup lane builtins (#187460 ) Part of https://github.com/llvm/llvm-project/issues/185382. Lower: - [vdup_n_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdup_n_bf16) - [vdupq_n_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdupq_n_bf16) - [vdup_lane_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdup_lane_bf16) - [vdupq_lane_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdupq_lane_bf16) - [vdup_laneq_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdup_laneq_bf16) - [vdupq_laneq_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdupq_laneq_bf16) and add tests in [bf16-getset.c](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon/bf16-getset.c). ## Approach ### `vdup_n_bf16` / `vdupq_n_bf16` These are not NEON builtins — they are regular `always_inline` functions defined in `arm_neon.h` that expand to vector aggregate initialization (`{v, v, v, v}`), so they work through the existing generic vector codegen path without requiring any builtin-specific handling. I just added CHECK lines in `bf16-getset.c` to verify the existing output is correct. ### `vdup_lane_bf16` / `vdupq_lane_bf16` / `vdup_laneq_bf16` / `vdupq_laneq_bf16` These are mapped (via `NEONEquivalentIntrinsicMap`) to the generic `splat_lane_v` / `splatq_lane_v` / `splat_laneq_v` / `splatq_laneq_v` builtins, which are handled in `emitCommonNeonBuiltinExpr`. I followed the approach used in both the OG codegen (`ARM.cpp`) and the [clangir incubator](https://github.com/nicovank/clangir): OG codegen in `ARM.cpp`: ```cpp switch (BuiltinID) { default: break; case NEON::BI__builtin_neon_splat_lane_v: case NEON::BI__builtin_neon_splat_laneq_v: case NEON::BI__builtin_neon_splatq_lane_v: case NEON::BI__builtin_neon_splatq_laneq_v: { auto NumElements = VTy->getElementCount(); if (BuiltinID == NEON::BI__builtin_neon_splatq_lane_v) NumElements = NumElements * 2; if (BuiltinID == NEON::BI__builtin_neon_splat_laneq_v) NumElements = NumElements.divideCoefficientBy(2); Ops[0] = Builder.CreateBitCast(Ops[0], VTy); return EmitNeonSplat(Ops[0], cast<ConstantInt>(Ops[1]), NumElements); } ``` clangir incubator in `CIRGenBuiltinAArch64.cpp`: ```cpp case NEON::BI__builtin_neon_splat_lane_v: case NEON::BI__builtin_neon_splat_laneq_v: case NEON::BI__builtin_neon_splatq_lane_v: case NEON::BI__builtin_neon_splatq_laneq_v: { uint64_t numElements = vTy.getSize(); if (builtinID == NEON::BI__builtin_neon_splatq_lane_v) numElements = numElements << 1; if (builtinID == NEON::BI__builtin_neon_splat_laneq_v) numElements = numElements >> 1; ops[0] = builder.createBitcast(ops[0], vTy); return emitNeonSplat(builder, getLoc(e->getExprLoc()), ops[0], ops[1], numElements); } ``` The call site for `splat_lane_v` already existed in `emitCommonNeonBuiltinExpr`, but had two issues: 1. `emitNeonSplat` was called but never defined. I added two helper functions (ported from the clangir incubator): `getIntValueFromConstOp` to extract the integer lane index from a CIR constant, and `emitNeonSplat` to build a splat shuffle mask and perform a `cir.vec.shuffle`. 2. The call site used `getLoc(e->getExprLoc())`, which is invalid because `emitCommonNeonBuiltinExpr` is a static free function, not a `CIRGenFunction` member. Fixed to use `cgf.getBuilder()` and the pre-computed `loc` variable. Additionally, I found that `NeonTypeFlags::BFloat16` and `NeonTypeFlags::Float16` were unhandled in `getNeonType`, which would cause the vector type to be unresolved for bf16/f16 intrinsics. I added the handling following the same pattern as the OG codegen: ```cpp case NeonTypeFlags::BFloat16: if (allowBFloatArgsAndRet) return cir::VectorType::get(cgf->getCIRGenModule().bFloat16Ty, v1Ty ? 1 : (4 << isQuad)); return cir::VectorType::get(cgf->uInt16Ty, v1Ty ? 1 : (4 << isQuad)); case NeonTypeFlags::Float16: if (hasLegalHalfType) return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty, v1Ty ? 1 : (4 << isQuad)); return cir::VectorType::get(cgf->uInt16Ty, v1Ty ? 1 : (4 << isQuad)); ``` When `allowBFloatArgsAndRet` is true, we use the native `cir::BF16Type`; otherwise we fall back to `u16i`. The same logic applies to `Float16` with `hasLegalHalfType`.	2026-03-27 07:16:49 +00:00
Lei Huang	e9cb7782b4	[NFC] Move PowerPC sema tests to test/Sema/PowerPC subdir (#188639 )	2026-03-26 15:27:49 -04:00
sskzakaria	8d1314f96d	[X86][Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX512 VPTESTM intrinsics to be used in constexpr #162071 (#174021 ) Adding Constexpr tests for ``` _mm_test_epi8_mask _mm_mask_test_epi8_mask _mm_test_epi16_mask _mm_mask_test_epi16_mask _mm_test_epi64_mask _mm_mask_test_epi32_mask _mm_test_epi32_mask _mm_mask_test_epi64_mask _mm_testn_epi8_mask _mm_mask_testn_epi8_mask _mm_testn_epi16_mask _mm_mask_testn_epi16_mask _mm_testn_epi64_mask _mm_mask_testn_epi32_mask _mm_testn_epi32_mask _mm_mask_testn_epi64_mask _mm256_test_epi8_mask _mm256_mask_test_epi8_mask _mm256_test_epi16_mask _mm256_mask_test_epi16_mask _mm256_test_epi64_mask _mm256_mask_test_epi32_mask _mm256_test_epi32_mask _mm256_mask_test_epi64_mask _mm256_testn_epi8_mask _mm256_mask_testn_epi8_mask _mm256_testn_epi16_mask _mm256_mask_testn_epi16_mask _mm256_testn_epi64_mask _mm256_mask_testn_epi32_mask _mm256_testn_epi32_mask _mm256_mask_testn_epi64_mask _mm512_test_epi8_mask _mm512_mask_test_epi8_mask _mm512_test_epi16_mask _mm512_mask_test_epi16_mask _mm512_test_epi64_mask _mm512_mask_test_epi32_mask _mm512_test_epi32_mask _mm512_mask_test_epi64_mask _mm512_testn_epi8_mask _mm512_mask_testn_epi8_mask _mm512_testn_epi16_mask _mm512_mask_testn_epi16_mask _mm512_testn_epi64_mask _mm512_mask_testn_epi32_mask _mm512_testn_epi32_mask _mm512_mask_testn_epi64_mask ``` FIXES #162071	2026-03-26 14:20:45 +00:00
Akira Hatanaka	6c8940ccad	Add support for anyAppleOS availability (#181953 ) The number of Apple platforms has grown over the years, resulting in availability annotations becoming increasingly verbose. Now that OS version names have been unified starting with version 26.0, this patch introduces a shorthand syntax that applies availability across all Apple platforms: ``` // Declaration. void foo __attribute__((availability(anyAppleOS, introduced=26.0))); // Guard. if (__builtin_available(anyAppleOS 27.0, *)) ``` Implementation: The `anyAppleOS` platform name is expanded at parse time into implicit platform-specific availability attributes for the target platform. For example, when targeting `macOS`, `anyAppleOS` creates an implicit `macOS` availability attribute with the same version. A priority system ensures correct attribute merging. Attributes expanded from `anyAppleOS` have lower priority than existing availability attributes: - Direct platform-specific attributes on declarations - Platform-specific attributes from #pragma clang attribute push - Attributes inferred from other platforms Among `anyAppleOS` attributes themselves, direct `anyAppleOS` annotations have higher priority than `anyAppleOS` applied through `#pragma clang attribute push`. The minimum supported version for `anyAppleOS` is 26.0. Versions older than 26.0 are rejected with a diagnostic error. `AvailabilityAttr` gains an `origAnyAppleOSVersion` field that stores the original `anyAppleOS` version when a platform-specific availability attribute is implicitly derived from an `anyAppleOS` annotation. This field is used in `@available`/`__builtin_available` fix-it hints to emit the `anyAppleOS` platform name and version rather than the expanded platform-specific name. For `__builtin_available` checks, `anyAppleOS` is lowered to platform-specific version checks in CodeGen. This reduces the burden of adding availability annotations to new APIs in Apple's SDKs and simplifies guards in applications. rdar://159386357	2026-03-25 09:26:05 -07:00
Ard Biesheuvel	9a1ebae029	[AARCH64] Support TPIDR_EL0 and TPIDRRO_EL0 as stack protector sysregs (#188054 ) Even though the command line option suggests that arbitrary system registers may be chosen, the sysreg option for the stack protector guard currently only permits SP_EL0, as this is what the Linux kernel uses. While it makes no sense to permit arbitrary system registers here (which usually have side effects), there is a desire to switch to TPIDR_EL0 or TPIDRRO_EL0 from the Linux side, both of which are part of the base v8.0 AArch64 ISA, and can hold arbitrary 64-bit values without side effects. So add TPIDR_EL0 and TPIDRRO_EL0 to the set of accepted arguments for the -mstack-protected-guard-reg= command line option. For good measure, add TPIDR_EL1, TPIDR_EL2, FAR_EL1 and FAR_EL2 as well, all of which could potentially be useful to privileged software such as the Linux kernel to stash a per-thread pointer to the stack protector guard value. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2026-03-25 08:57:12 -07:00
Owen Anderson	ca9ac0e24a	[CHERI] Allow @llvm.returnaddress to return a pointer in any address space. (#188464 ) Clang now constructs calls to it using the default program address space from the DataLayout. Co-authored-by: Alex Richardson <alexrichardson@google.com>	2026-03-25 13:59:38 +00:00
Lei Huang	993b110502	[PowerPC] Add target feature validation for builtins in Sema (#187371 ) Adds early target feature checking for PowerPC builtins during semantic analysis, catching missing target features before code generation and providing better error messages to users. Assisted by AI.	2026-03-24 14:31:42 -04:00
Craig Topper	6c6b4c154c	[RISCV] Disable rounding of aggregate return/arguments to iXLen. (#184736 ) If the type is rounded to iXLen, an additional zext instruction is generated. For example, https://godbolt.org/z/bG7vG4dvM	2026-03-23 19:39:21 -07:00
Lukacma	9426fc19af	[AArch64] Fix _sys implemantation and MRS/MSR Sema checks (#187290 ) This patch fixes lowering of _sys builtin, which used to lower into invalid MSR S1... instruction. This was fixed by adding new sys llvm intrinsic and proper lowering into sys instruction and its aliases. I also fixed the sema check for _sys, _ReadStatusRegister and _WriteStatusRegister builtins so they correctly capture invalid usecases.	2026-03-23 10:12:41 +00:00
Simon Pilgrim	9687ef3693	[clang][x86] Allow AVX512 expand intrinsics to be used in constexpr (#187946 ) Fixes #163734	2026-03-23 08:28:55 +00:00
Zihao Wang	b1cf9b0835	[Clang] Support constexpr for AVX512 compress intrinsics (#187656 ) Fixes #163732	2026-03-22 14:27:27 +00:00
Rana Pratap Reddy	df9eb79970	[Clang][AMDGPU] Lower `__amdgpu_texture_t` to `<8 x i32>` instead of ptr adrspace(0) (#187774 ) Fix the IR lowering for `__amdgpu_texture_t` to generate a single 256-bit load instead of a double indirection through a flat pointer. Previously, `__amdgpu_texture_t` was lowered to `ptr addrspace(0)` (64-bit flat pointer), which caused the double load and indirection. With the same reproducer like #187697. ```c #define TSHARP __constant uint * // Old tsharp handling: // #define LOAD_TSHARP(I) (__constant uint8 )I #define LOAD_TSHARP(I) (__constant __amdgpu_texture_t )I float4 test_image_load_1D(TSHARP i, int c) { return __builtin_amdgcn_image_load_1d_v4f32_i32(15, c, LOAD_TSHARP(i), 0, 0); } ``` old output: ```llvm define hidden <4 x float> @test_image_load_1D(ptr addrspace(4) noundef readonly captures(none) %i, i32 noundef %c) local_unnamed_addr #0 { entry: %0 = load ptr, ptr addrspace(4) %i, align 32, !tbaa !9 %1 = addrspacecast ptr %0 to ptr addrspace(1) %tex.rsrc.val = load <8 x i32>, ptr addrspace(1) %1, align 32 %2 = tail call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32.v8i32(i32 15, i32 %c, <8 x i32> %tex.rsrc.val, i32 0, i32 0) ret <4 x float> %2 } ``` This matches the old `__constant uint8 *` behavior. With this fix new output is ```llvm define hidden <4 x float> @test_image_load_1D(ptr addrspace(4) noundef readonly captures(none) %0, i32 noundef %1) local_unnamed_addr #0 { %3 = load <8 x i32>, ptr addrspace(4) %0, align 32, !tbaa !10 %4 = tail call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32.v8i32(i32 15, i32 %1, <8 x i32> %3, i32 0, i32 0) ret <4 x float> %4 } ``` Fixes #187697	2026-03-21 22:21:12 +05:30
CarolineConcatto	d96722b660	[LLVM] Improve IR parsing and printing for target memory locations (#176968 ) This patch adds support for specifying all target memory locations using a single IR spellings such as: ``` memory(target_mem: read) ``` This form is not supported in TableGen, but it is now accepted by the IR parser. When the parser encounters target_mem, it expands it to all target-memory locations (e.g., target_mem0, target_mem1, …). Printing behavior When all target-memory locations share the same ModRef value, the printer now collapses them into a single entry: ``` memory(target_mem: read) ``` Otherwise, each target memory location is printed separately. Rejected IR: ``` memory(target_mem0: write, target_mem: read) ``` This is invalid because the default access kind for the target memory group must appear first.	2026-03-19 17:29:54 +00:00
Wael Yehia	495c518b96	[FMV][AIX] Implement target_clones (cpu-only) (#177428 ) This PR implements Function Multi-versioning on AIX using `__attribute__ ((target_clones(<feature-list>)))`. Initially, we will only support specifying a cpu in the version list. Feature strings (such as "altivec" or "isel") on target_clones will be implemented in a future PR. Accepted syntax: ``` __attribute__((target_clones(<OPTIONS>))) ``` where `<OPTIONS>` is a comma separated list of strings, each string is either: 1) the default string `"default"` 2) a cpu string `"cpu=<CPU>"`, where `<CPU>`is a value accepted by the `-mcpu` flag. For example, specifying the following on a function ``` __attribute__((target_clones("default", "cpu=power8", "cpu=power9"))) int foo(int x) { return x + 1; } ``` Would generate 3 versions of `foo`: (1) `foo.default`, (2) `foo.cpu_power8`, and (3) `foo.cpu_power9`, an IFUNC `foo`, and the resolver function `foo.resolver`, for the IFUNC, that chooses one of the three versions at runtime. --------- Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>	2026-03-17 23:15:15 -04:00
Justin Stitt	18c8b8d81d	[Clang] Add __ob_trap support for implicit integer sign change (#185772 ) The `__ob_trap` type specifier can be used to trap (or warn with sanitizers) when overflow or truncation occurs on the specified type. There was a gap in coverage for this with the `-fsanitize=implicit-integer-sign-change` sanitizer. Fix this by carrying around `__ob_trap` information through `EmitIntegerSignChange()` which allows us to properly trap or warn.	2026-03-17 11:28:53 -07:00
albertbolt1	6e17b2ef33	[CIR][AArch64] Upstream NEON shift left builtins (#186406 ) This PR adds CIR generation for the following AArch64 NEON builtins: __builtin_neon_vshld_n_s64 and __builtin_neon_vshld_n_u64 (constant shifts) extracted the constant value and use it directly for shift left __builtin_neon_vshld_s64 and __builtin_neon_vshld_u64 (variable shifts) there is an existing function to handles SISD (SIngle Instruction Single Data), reusing this to create the right CIR instructions __builtin_neon_vshld_s64 -- call i64 @llvm.aarch64.neon.sshl.i64(i64 [[A]], i64 [[B]]) __builtin_neon_vshld_u64 -- call i64 @llvm.aarch64.neon.ushl.i64(i64 [[A]], i64 [[B]]) added test cases in intrinsics.c by looking at the test cases present in https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon-shifts.c before adding the code it gave a not implemented error and after adding the code changes the error is not present the code succeeds. ran the test cases using ``` bin/llvm-lit -v \ /Users/albertbolt/projects/llvm-project/clang/test/CodeGen/AArch64/neon/intrinsics.c ``` #185382 --------- Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>	2026-03-17 13:43:14 +00:00
Andrzej Warzyński	0fa9a7797b	[Clang][AArch64] Update comments in tests (nfc) (#186885 )	2026-03-16 21:21:34 +00:00
Jiahao Guo	04797bc692	[CIR][AArch64] Lower BF16 vduph lane builtins (#185852 ) Part of #185382. Lower `__builtin_neon_vduph_lane_bf16` and `__builtin_neon_vduph_laneq_bf16` in ClangIR to `cir.vec.extract`, and add dedicated AArch64 Neon BF16 tests. This is my first LLVM PR, so I'd really appreciate any suggestions on the implementation, test structure, or general LLVM contribution style.	2026-03-16 16:50:41 +00:00
Andrzej Warzyński	a1054ec627	[clang][AArch64] Update label in test (nfc) (#186759 )	2026-03-16 11:22:56 +00:00
Henrich Lauko	3bc216c29c	[CIR] Split CIR_UnaryOp into individual operations (#185280 ) Split the monolithic cir.unary operation (which dispatched on a UnaryOpKind enum) into four separate operations: cir.inc, cir.dec, cir.minus, and cir.not. Changes: - Add CIR_UnaryOpInterface with getInput()/getResult() methods - Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes - Define IncOp, DecOp, MinusOp, NotOp with per-op folds - Add Involution trait to NotOp for not(not(x)) -> x folding - Replace createUnaryOp() with createInc/Dec/Minus/Not builders - Split LLVM lowering into four separate patterns - Split LoweringPrepare complex-type handling per unary op - Update CIRCanonicalize and CIRSimplify for new op types - Update all codegen files to use bool params instead of UnaryOpKind - Remove CIR_UnaryOpKind enum and old CIR_UnaryOp definition Assembly format change: cir.unary(inc, %x) nsw : !s32i, !s32i -> cir.inc nsw %x : !s32i cir.unary(not, %x) : !u32i, !u32i -> cir.not %x : !u32i	2026-03-14 23:50:43 +01:00
Prabhu Rajasekaran	60669c1cfe	Fix callee type generation (#186272 ) The callee_type metadata is expected to be a list of generalized type metadata by the IR verifier. But for indirect calls with internal linkage the type metadata is just an integer. Avoid including them in callee_type metadata. This will reduce the precision of the generated call graph as the edges to internal linkage functions whose address were taken will not be present anymore. We need to handle this in the future.	2026-03-13 16:24:22 -07:00
Lei Huang	c3a13616c6	XFAIL on AIX: clang/test/CodeGen/distributed-thin-lto/pass-plugin.ll (#186452 )	2026-03-13 13:04:31 -04:00
Lei Huang	097e786016	XFAIL clang/test/CodeGen/distributed-thin-lto/pass-plugin.ll (#186425 ) Failing on AIX as it can't find the new symbol in the exported list. XFAIL to bring the bots green while we investigate. Test introduced in: https://github.com/llvm/llvm-project/pull/183525	2026-03-13 11:50:13 -04:00
paperchalice	26ac669101	[LLVM] Remove "no-nans-fp-math" attribute support (#186285 ) Now all `NoNaNsFPMath` uses have been removed, remove this attribute.	2026-03-13 09:29:28 +00:00
Justin Stitt	6c35a6736c	[Clang] Check sanitizer ignorelist for divrem overflow (#185721 ) Instrumentation emitted for overflow by division was not checking with the sanitizer case list's type entries. The original type-based ignorelist support (#107332) added `isTypeIgnoredBySanitizer` calls to `CanElideOverflowCheck`, which covers `+`, `-`, `*`, `++`, `--`. However, division and remainder have a separate code path in `EmitUndefinedBehaviorIntegerDivAndRemCheck` that never calls `CanElideOverflowCheck` or checks the ignorelist directly. Add a check so that the SCL is honored for the div/rem case.	2026-03-12 14:42:32 -07:00
Lei Huang	bf85f52fbd	[PowerPC] Update dmr builtin names (#183160 ) Remove `_mma` from the following built-ins as they are not related to MMA: * __builtin_mma_dmsetdmrz * __builtin_mma_dmmr * __builtin_mma_dmxor * __builtin_mma_build_dmr * __builtin_mma_disassemble_dmr AI Assisted.	2026-03-12 12:54:07 -04:00
Artem Belevich	595b961400	[CUDA] Use monotonic ordering for __nvvm_atom* builtins (#185822 ) CUDA's __nvvm_atom* builtins are expected to produce atomic operations with relaxed ordering. However, Clang lowered tham as atomicrmw and cmpxchg with the default seq_cst ordering. That mismatch went unnoticed because until recently NVPTX back end was unable to lower all atomic instructions correctly, and despite using `cst_seq` ordering in IR we ended up generating the intended PTX instructions with relaxed ordering, It worked well enough until https://github.com/llvm/llvm-project/pull/179553 implemented correct NVPTX atomic lowering. That, in turn, caused severe performance regression for the code that relied on these builtins. Thanks to @akshayrdeodhar for figuring out what happened. Switching __nvvm_atom* builtins to generate atomic instructions with monotonic ordering matches the expected semantics of the builtins, and restores performance of the generated code. See: https://github.com/llvm/llvm-project/pull/179553#issuecomment-4035193968	2026-03-12 09:48:09 -07:00
Andrzej Warzyński	5e887716b0	[Clang][AArch64] Remove duplicate CodeGen test for bf16 get/set intrinsics (#186084 ) The following test files contain identical test bodies (aside from the RUN lines): * clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c * clang/test/CodeGen/arm-bf16-getset-intrinsics.c The differences in the RUN lines do not appear to be relevant for the tested functionality. This change keeps a single test file and simplifies its RUN lines to match the generic style used in clang/test/CodeGen/AArch64/neon. This also moves toward unifying and reusing RUN lines across tests.	2026-03-12 13:00:28 +00:00
Matt Arsenault	7cb3005ba2	AMDGPU: Add dereferenceable attribute to dispatch ptr intrinsic (#185955 ) Stop manually setting it on the callsite in clang.	2026-03-12 07:28:39 +01:00

1 2 3 4 5 ...

10562 Commits