10562 Commits

Author SHA1 Message Date
neonetizen
e11a31f4c7
[CIR][AArch64] Lower FP16 vduph lane intrinsics (#186955)
From #185382 

Lower `vduph_lane_f16` and `vduph_laneq_f16` to `cir::VecExtractOp`

Tests moved from `v8.2a-neon-instrinsics-generic.c` to a new CIR-enabled
test file.

I tried following from notes made in #185852 (BF16)
2026-04-06 19:12:34 +01:00
Andrzej Warzyński
38c53b3eb9
[clang][cir][nfc] Fix comments, add missing EOF (#190623) 2026-04-06 18:06:57 +01:00
albertbolt1
8d7823ea8f
[CIR][AArch64] Added vector intrinsics for shift left (#187516)
Added vector intrinsics for 
vshlq_n_s8
vshlq_n_s16
vshlq_n_s32
vshlq_n_s64
vshlq_n_u8
vshlq_n_u16
vshlq_n_u32
vshlq_n_u64

vshl_n_s8
vshl_n_s16
vshl_n_s32
vshl_n_s64
vshl_n_u8
vshl_n_u16
vshl_n_u32
vshl_n_u64

these cover all the vector intrinsics for constant shift 

the method followed 

1) the vectors for quad words are of the form `64x2`, `32x4`, `16x8`,
`8x16` and the shift is a constant value but for shift left we need both
of them to be vectors so we take the constant shift and convert it into
a vector of respective form, for `64x2` we convert the constant to
`64x2`, I have learnt that this process is also called **splat**
2) After splat we have that the lhs and rhs are of the same size hence
the shift left can be applied
3) There is one issue though, the ops[0] is not of the right size, for
quad words it falls back to the default int8*16 in the function, so I am
converting it to the required size using bit casting, `8x16` = `64x2` so
we can bitcast and get the vector array in the right form.


Wrote the test cases for all the intrinsics listed above

#185382
2026-04-06 17:00:38 +01:00
Eli Friedman
9471fabf8a
[clang] Fix issues with const/pure on varargs function. (#190252)
There are two related issues here. On the declaration/definition side,
we need to make sure the markings are conservative. Then on the caller
side, we need to make sure we don't access parameters that don't exist.

Fixes #187535.
2026-04-03 13:57:35 -07:00
Florian Hahn
6476619f30
[Matrix] Use matrix element type for TBAA nodes. (#190029)
Matrix loads and stores are accesses of their element types. Emit TBAA
nodes using their element type to allow more precise TBAA alias
analysis.

PR: https://github.com/llvm/llvm-project/pull/190029
2026-04-03 20:11:04 +00:00
Amr Hesham
2108252f0e
[clang] Fixed a crash when explicitly casting to atomic complex (#172163)
Fixed a crash when explicitly casting a scalar to an atomic complex.

resolve: #114885
2026-04-03 19:28:20 +02:00
Amr Hesham
f2dff15995
[clang] Fixed a crash when explicitly casting between atomic complex types (#172210)
Fixed a crash when explicitly casting between atomic complex types

resolve: #172208
2026-04-02 22:55:43 +02:00
Justin Stitt
43233b8aae
[Clang] Add missing __ob_trap check for sign change (#188340)
Add a missing OBTrapInvolved check before EmitIntegerSignChangeCheck().

This is considered "missing" as a previous attempt (https://github.com/llvm/llvm-project/pull/185772) to properly add an `__ob_trap` backdoor missed this particular instance.

This backdoor is needed because we want `__ob_trap` types to be picky about implicit conversions (including implicit sign change):

```c
	unsigned int __ob_trap big = 4294967295;
	(signed int)big; // should trap!
```

Move the `OBTrapInvolved` setup to the top of the function so it can be used in all the places we need it.
2026-04-02 10:51:46 -07:00
Tony Guillot
42b6a6faaa
[Clang] Fixed the behavior of C23 auto when an array type was specified for a char * (#189722)
At the time of the implementation of
[N3007](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3007.htm) in
Clang, when an array type was specified, an error was emitted unless the
deduced type was a `char *`.
After further inspection in the C standard, it turns out that the
inferred type of an `char[]` should be deduced to a `char *`, which
should emit an error if an array type is specified with `auto`.

This now invalidates the following cases:
```c
auto s1[] = "test";
auto s2[4] = "test";
auto s3[5] = "test";
```

Fixes #162694
2026-04-02 18:40:21 +02:00
Trung Nguyen
9dc1da6e87
[clang] Add support for MSVC force inline attrs (#185282)
Add support for `[[msvc::forceinline]]` and
`[[msvc::forceinline_calls]]`.

`[[msvc::forceinline]]` is equivalent to Microsoft's `__forceinline`
when placed before a function declaration.
Unlike `__forceinline`, `[[msvc::forceinline]]` works with lambdas.

`[[msvc::forceinline_calls]]` is simliar to `[[clang::always_inline]]`
but only works on statements.

Both are implemented as aliases of `[[clang::always_inline]]` with
special checks.

Fixes #186539.
2026-04-02 16:42:26 +02:00
wanglei
76fc936175
[Clang][LoongArch] Align LSX/LASX built-in signatures with intrinsic types to avoid lax conversions (#189900)
Update the built-in signatures in BuiltinsLoongArchLSX.def and
BuiltinsLoongArchLASX.def to precisely match the vector types used in
the corresponding intrinsic headers (lsxintrin.h and lasxintrin.h).

This alignment ensures that these intrinsics can be compiled
successfully even when -flax-vector-conversions=none is specified, since
the built-in arguments no longer rely on implicit vector type
conversions.

Added new test cases to verify the macro-defined LSX/LASX
intrinsic interfaces under -flax-vector-conversions=none.

Fixes #189898
2026-04-02 16:11:22 +08:00
Florian Hahn
b46f8fa622
[Matrix] Add tests checking TBAA emission for matrix types (NFC). (#189953)
PR: https://github.com/llvm/llvm-project/pull/189953
2026-04-01 13:31:09 +00:00
VASU SHARMA
2313989499
[UBSAN] [NFC] pre-commit tests for null, alignment, bounds checks (#176210)
PR to add precommit tests to document current UBSAN behavior for
aggregate copy operations.

The test covers:
- Sanitizers: null, alignment, bounds
- Type variants: plain, _Atomic (C), volatile (C)
- Operand forms: arr[idx], *ptr
- Operations: assignment, initialization, initializer list, variadic
args, nested member access
- C++ specific: direct/brace/copy-list init, new expressions, casts,
operator=, virtual base init
- Bounds checking: in-bounds access, past-the-end access (index ==
size), beyond bounds access, dynamic index

---------

Co-authored-by: vasu-ibm <Vasu.Sharma2@ibm.com>
Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
2026-04-01 14:04:28 +05:30
Alexis Engelke
74e84c0cf5
[Clang] Fix getTerminator() use for -fasync-exceptions (#189644) 2026-03-31 12:50:25 +00:00
Alex Voicu
18e6958903
[SPIRV][AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (#134016)
This change adds two builtins for AMDGPU:

- `__builtin_amdgcn_processor_is`, which is similar in observable
behaviour with `__builtin_cpu_is`, except that it is never "evaluated"
at run time;
- `__builtin_amdgcn_is_invocable`, which is behaviourally similar with
`__has_builtin`, except that it is not a macro (i.e. not evaluated at
preprocessing time).

Neither of these are `constexpr`, even though when compiling for
concrete (i.e. `gfxXXX` / `gfxXXX-generic`) targets they get evaluated
in Clang, so they shouldn't tear the AST too badly / at all for
multi-pass compilation cases like HIP. They can only be used in specific
contexts (as args to control structures).

The motivation for adding these is two-fold:

- as a nice to have, it provides an AST-visible way to incorporate
architecture specific code, rather than having to rely on macros and the
preprocessor, which burn in the choice quite early;
- as a must have, it allows featureful AMDGCN flavoured SPIR-V to be
produced, where target specific capability is guarded and chosen or
discarded when finalising compilation for a concrete target; this is
built atop the Speciali\ation Constant concept which is described in the
SPIR-V specification under section [2.12
Specialization](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_specialization_2)

I've tried to keep the overall footprint of the change small. The
changes to Sema are a bit unpleasant, but there was a strong desire to
have Clang validate these, and to constrain their uses, and this was the
most compact solution I could come up with (suggestions welcome).

---------

Co-authored-by: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>
Co-authored-by: Voicu <avoicu@amd.com>
2026-03-30 23:02:26 +01:00
Jean-Michel Gorius
65cb5c3975
[clang][x86] Fix the return type of the cvtpd2dq builtin (#189254)
The CVTPD2DQ instruction converts packed 64-bit floating-point values to
packed 32-bit signed integer values. This patch fixes the return type of
the corresponding builtin, which previously returned a vector of two
64-bit signed integers. The new behavior is in line with the return type
of the CVTTPD2DQ builtin.
2026-03-30 10:38:21 +00:00
Owen Anderson
3f2e24726a
[CHERI] Allow @llvm.clear_cache to accept pointers in address spaces other than 0. (#189283)
Co-Authored-by: Jessica Clarke <jrtc27@jrtc27.com>
2026-03-30 09:20:49 +02:00
Kamran Yousafzai
1264ffc4cc
[clang][RISC-V] fixed fp calling convention for fpcc eligible structs for risc-v (#110690)
The code generated for calls with FPCC eligible structs as arguments
doesn't consider the bitfield, which results in a store crossing the
boundary of the memory allocated using alloca, e.g.
For the code:
```
struct __attribute__((packed, aligned(1))) S {
   const float  f0;
   unsigned f1 : 1;
};
unsigned  func(struct S  arg)
{
    return arg.f1;
} 
```
The generated IR is:
```
 define dso_local signext i32 @func(
 float [[TMP0:%.*]], i32 [[TMP1:%.*]]) #[[ATTR0:[0-9]+]] {
  [[ENTRY:.*:]]
    [[ARG:%.*]] = alloca [[STRUCT_S:%.*]], align 1
    [[TMP2:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 0
    store float [[TMP0]], ptr [[TMP2]], align 1
    [[TMP3:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 1
    store i32 [[TMP1]], ptr [[TMP3]], align 1
    [[F1:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[ARG]], i32 0, i32 1
    [[BF_LOAD:%.*]] = load i8, ptr [[F1]], align 1
    [[BF_CLEAR:%.*]] = and i8 [[BF_LOAD]], 1
    [[BF_CAST:%.*]] = zext i8 [[BF_CLEAR]] to i32
    ret i32 [[BF_CAST]]
```
Where, `store i32 [[TMP1]], ptr [[TMP3]], align 1` can be seen crossing
the boundary of the allocated memory. If, the IR is seen after
optimizations (EarlyCSEPass), the IR left is:
```
 define dso_local noundef signext i32 @func(
 float [[TMP0:%.*]], i32 [[TMP1:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
  [[ENTRY:.*:]]
    ret i32 0
```
The patch trims the second member of the struct after taking into
consideration the bitwidth to decide the appropriate integer type and
the test shows the results of this patch.

Note that the bug is seen only when `f` extension is enabled for FPCC
eligibility.

Co-authored-by: muhammad.kamran4 <muhammad.kamran@esperantotech.com>
2026-03-27 16:55:11 -07:00
Pau Sum
0f81923735
[CIR][AArch64] Upstream vmull_*/vmull_high_* and vmul_p8/vmul_high_p8 Neon builtins (#188371)
Add CIR generation for AArch64 NEON builtins `vmull_*` and
`vmull_high_*`

The accompanying tests from
[AArch64/neon-instrinsics](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon-intrinsics.c)
were integrated with new checks for CIR codegen.

Part of #185382
2026-03-27 21:18:24 +00:00
Justin Stitt
8b395a7755
[Clang] Ensure pattern exclusion priority over OBT (#188390)
Make sure pattern exclusions have priority over the overflow behavior types when deciding whether or not to emit truncation checks.

Accomplish this by carrying an extra field through `ScalarConversionOpts` which we later check before emitting instrumentation.
2026-03-27 13:51:21 -07:00
Xinlong Chen
26f26400d9
[CIR][AArch64] Upstream NEON Maximum builtins (#188503)
Implement CIR codegen for `vmax_*`, `vmaxq_*`, `maxnm_*`, `vmaxnmq_*`
AArch64 NEON builtins.

part of https://github.com/llvm/llvm-project/issues/185382
2026-03-27 10:54:13 +00:00
Jiahao Guo
013cf4fd1f
[CIR][AArch64] Support BF16 Neon types and lower vdup lane builtins (#187460)
Part of https://github.com/llvm/llvm-project/issues/185382.

Lower:
-
[vdup_n_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdup_n_bf16)
-
[vdupq_n_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdupq_n_bf16)
-
[vdup_lane_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdup_lane_bf16)
-
[vdupq_lane_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdupq_lane_bf16)
-
[vdup_laneq_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdup_laneq_bf16)
-
[vdupq_laneq_bf16](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdupq_laneq_bf16)

and add tests in
[bf16-getset.c](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon/bf16-getset.c).

## Approach

### `vdup_n_bf16` / `vdupq_n_bf16`

These are not NEON builtins — they are regular `always_inline` functions
defined in `arm_neon.h` that expand to vector aggregate initialization
(`{v, v, v, v}`), so they work through the existing generic vector
codegen path without requiring any builtin-specific handling. I just
added CHECK lines in `bf16-getset.c` to verify the existing output is
correct.

### `vdup_lane_bf16` / `vdupq_lane_bf16` / `vdup_laneq_bf16` /
`vdupq_laneq_bf16`

These are mapped (via `NEONEquivalentIntrinsicMap`) to the generic
`splat_lane_v` / `splatq_lane_v` / `splat_laneq_v` / `splatq_laneq_v`
builtins, which are handled in `emitCommonNeonBuiltinExpr`.

I followed the approach used in both the OG codegen (`ARM.cpp`) and the
[clangir incubator](https://github.com/nicovank/clangir):

**OG codegen in `ARM.cpp`:**

```cpp
switch (BuiltinID) {
  default: break;
  case NEON::BI__builtin_neon_splat_lane_v:
  case NEON::BI__builtin_neon_splat_laneq_v:
  case NEON::BI__builtin_neon_splatq_lane_v:
  case NEON::BI__builtin_neon_splatq_laneq_v: {
    auto NumElements = VTy->getElementCount();
    if (BuiltinID == NEON::BI__builtin_neon_splatq_lane_v)
      NumElements = NumElements * 2;
    if (BuiltinID == NEON::BI__builtin_neon_splat_laneq_v)
      NumElements = NumElements.divideCoefficientBy(2);

    Ops[0] = Builder.CreateBitCast(Ops[0], VTy);
    return EmitNeonSplat(Ops[0], cast<ConstantInt>(Ops[1]), NumElements);
  }
```

**clangir incubator in `CIRGenBuiltinAArch64.cpp`:**

```cpp
  case NEON::BI__builtin_neon_splat_lane_v:
  case NEON::BI__builtin_neon_splat_laneq_v:
  case NEON::BI__builtin_neon_splatq_lane_v:
  case NEON::BI__builtin_neon_splatq_laneq_v: {
    uint64_t numElements = vTy.getSize();
    if (builtinID == NEON::BI__builtin_neon_splatq_lane_v)
      numElements = numElements << 1;
    if (builtinID == NEON::BI__builtin_neon_splat_laneq_v)
      numElements = numElements >> 1;
    ops[0] = builder.createBitcast(ops[0], vTy);
    return emitNeonSplat(builder, getLoc(e->getExprLoc()), ops[0], ops[1],
                         numElements);
  }
```

The call site for `splat_lane_v` already existed in
`emitCommonNeonBuiltinExpr`, but had two issues:

1. **`emitNeonSplat` was called but never defined.** I added two helper
functions (ported from the clangir incubator): `getIntValueFromConstOp`
to extract the integer lane index from a CIR constant, and
`emitNeonSplat` to build a splat shuffle mask and perform a
`cir.vec.shuffle`.

2. **The call site used `getLoc(e->getExprLoc())`, which is invalid**
because `emitCommonNeonBuiltinExpr` is a static free function, not a
`CIRGenFunction` member. Fixed to use `cgf.getBuilder()` and the
pre-computed `loc` variable.

Additionally, I found that `NeonTypeFlags::BFloat16` and
`NeonTypeFlags::Float16` were unhandled in `getNeonType`, which would
cause the vector type to be unresolved for bf16/f16 intrinsics. I added
the handling following the same pattern as the OG codegen:

```cpp
  case NeonTypeFlags::BFloat16:
    if (allowBFloatArgsAndRet)
      return cir::VectorType::get(cgf->getCIRGenModule().bFloat16Ty, v1Ty ? 1 : (4 << isQuad));
    return cir::VectorType::get(cgf->uInt16Ty, v1Ty ? 1 : (4 << isQuad));
  case NeonTypeFlags::Float16:
    if (hasLegalHalfType)
      return cir::VectorType::get(cgf->getCIRGenModule().fP16Ty, v1Ty ? 1 : (4 << isQuad));
    return cir::VectorType::get(cgf->uInt16Ty, v1Ty ? 1 : (4 << isQuad));
```

When `allowBFloatArgsAndRet` is true, we use the native `cir::BF16Type`;
otherwise we fall back to `u16i`. The same logic applies to `Float16`
with `hasLegalHalfType`.
2026-03-27 07:16:49 +00:00
Lei Huang
e9cb7782b4
[NFC] Move PowerPC sema tests to test/Sema/PowerPC subdir (#188639) 2026-03-26 15:27:49 -04:00
sskzakaria
8d1314f96d
[X86][Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX512 VPTESTM intrinsics to be used in constexpr #162071 (#174021)
Adding Constexpr tests for 
```
_mm_test_epi8_mask _mm_mask_test_epi8_mask
_mm_test_epi16_mask _mm_mask_test_epi16_mask
_mm_test_epi64_mask _mm_mask_test_epi32_mask
_mm_test_epi32_mask _mm_mask_test_epi64_mask
_mm_testn_epi8_mask _mm_mask_testn_epi8_mask
_mm_testn_epi16_mask _mm_mask_testn_epi16_mask
_mm_testn_epi64_mask _mm_mask_testn_epi32_mask
_mm_testn_epi32_mask _mm_mask_testn_epi64_mask

_mm256_test_epi8_mask _mm256_mask_test_epi8_mask
_mm256_test_epi16_mask _mm256_mask_test_epi16_mask
_mm256_test_epi64_mask _mm256_mask_test_epi32_mask
_mm256_test_epi32_mask _mm256_mask_test_epi64_mask
_mm256_testn_epi8_mask _mm256_mask_testn_epi8_mask
_mm256_testn_epi16_mask _mm256_mask_testn_epi16_mask
_mm256_testn_epi64_mask _mm256_mask_testn_epi32_mask
_mm256_testn_epi32_mask _mm256_mask_testn_epi64_mask

_mm512_test_epi8_mask _mm512_mask_test_epi8_mask
_mm512_test_epi16_mask _mm512_mask_test_epi16_mask
_mm512_test_epi64_mask _mm512_mask_test_epi32_mask
_mm512_test_epi32_mask _mm512_mask_test_epi64_mask
_mm512_testn_epi8_mask _mm512_mask_testn_epi8_mask
_mm512_testn_epi16_mask _mm512_mask_testn_epi16_mask
_mm512_testn_epi64_mask _mm512_mask_testn_epi32_mask
_mm512_testn_epi32_mask _mm512_mask_testn_epi64_mask
```

 FIXES #162071
2026-03-26 14:20:45 +00:00
Akira Hatanaka
6c8940ccad
Add support for anyAppleOS availability (#181953)
The number of Apple platforms has grown over the years, resulting in
availability annotations becoming increasingly verbose. Now that OS
version names have been unified starting with version 26.0, this patch
introduces a shorthand syntax that applies availability across all Apple
platforms:

```
// Declaration.
void foo __attribute__((availability(anyAppleOS, introduced=26.0)));

// Guard.
if (__builtin_available(anyAppleOS 27.0, *))
```

Implementation:

The `anyAppleOS` platform name is expanded at parse time into implicit
platform-specific availability attributes for the target platform. For
example, when targeting `macOS`, `anyAppleOS` creates an implicit
`macOS` availability attribute with the same version.

A priority system ensures correct attribute merging. Attributes expanded
from `anyAppleOS` have lower priority than existing availability
attributes:
- Direct platform-specific attributes on declarations
- Platform-specific attributes from #pragma clang attribute push
- Attributes inferred from other platforms

Among `anyAppleOS` attributes themselves, direct `anyAppleOS`
annotations have higher priority than `anyAppleOS` applied through
`#pragma clang attribute push`.

The minimum supported version for `anyAppleOS` is 26.0. Versions older
than 26.0 are rejected with a diagnostic error.

`AvailabilityAttr` gains an `origAnyAppleOSVersion` field that stores
the original `anyAppleOS` version when a platform-specific availability
attribute is implicitly derived from an `anyAppleOS` annotation. This
field is used in `@available`/`__builtin_available` fix-it hints to emit
the `anyAppleOS` platform name and version rather than the expanded
platform-specific name.

For `__builtin_available` checks, `anyAppleOS` is lowered to
platform-specific version checks in CodeGen.

This reduces the burden of adding availability annotations to new APIs
in Apple's SDKs and simplifies guards in applications.

rdar://159386357
2026-03-25 09:26:05 -07:00
Ard Biesheuvel
9a1ebae029
[AARCH64] Support TPIDR_EL0 and TPIDRRO_EL0 as stack protector sysregs (#188054)
Even though the command line option suggests that arbitrary system
registers may be chosen, the sysreg option for the stack protector guard
currently only permits SP_EL0, as this is what the Linux kernel uses.

While it makes no sense to permit arbitrary system registers here (which
usually have side effects), there is a desire to switch to TPIDR_EL0 or
TPIDRRO_EL0 from the Linux side, both of which are part of the base v8.0
AArch64 ISA, and can hold arbitrary 64-bit values without side effects.

So add TPIDR_EL0 and TPIDRRO_EL0 to the set of accepted arguments for
the -mstack-protected-guard-reg= command line option. For good measure,
add TPIDR_EL1, TPIDR_EL2, FAR_EL1 and FAR_EL2 as well, all of which
could potentially be useful to privileged software such as the Linux
kernel to stash a per-thread pointer to the stack protector guard value.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-03-25 08:57:12 -07:00
Owen Anderson
ca9ac0e24a
[CHERI] Allow @llvm.returnaddress to return a pointer in any address space. (#188464)
Clang now constructs calls to it using the default program address space from the DataLayout.

Co-authored-by: Alex Richardson <alexrichardson@google.com>
2026-03-25 13:59:38 +00:00
Lei Huang
993b110502
[PowerPC] Add target feature validation for builtins in Sema (#187371)
Adds early target feature checking for PowerPC builtins during semantic
analysis, catching missing target features before code generation and
providing better error messages to users.

Assisted by AI.
2026-03-24 14:31:42 -04:00
Craig Topper
6c6b4c154c
[RISCV] Disable rounding of aggregate return/arguments to iXLen. (#184736)
If the type is rounded to iXLen, an additional zext instruction is
generated. For example, https://godbolt.org/z/bG7vG4dvM
2026-03-23 19:39:21 -07:00
Lukacma
9426fc19af
[AArch64] Fix _sys implemantation and MRS/MSR Sema checks (#187290)
This patch fixes lowering of _sys builtin, which used to lower into
invalid MSR S1... instruction. This was fixed by adding new sys llvm
intrinsic and proper lowering into sys instruction and its aliases.

I also fixed the sema check for _sys, _ReadStatusRegister and
_WriteStatusRegister builtins so they correctly capture invalid
usecases.
2026-03-23 10:12:41 +00:00
Simon Pilgrim
9687ef3693
[clang][x86] Allow AVX512 expand intrinsics to be used in constexpr (#187946)
Fixes #163734
2026-03-23 08:28:55 +00:00
Zihao Wang
b1cf9b0835
[Clang] Support constexpr for AVX512 compress intrinsics (#187656)
Fixes #163732
2026-03-22 14:27:27 +00:00
Rana Pratap Reddy
df9eb79970
[Clang][AMDGPU] Lower __amdgpu_texture_t to <8 x i32> instead of ptr adrspace(0) (#187774)
Fix the IR lowering for `__amdgpu_texture_t` to generate a single
256-bit load instead of a double indirection through a flat pointer.

Previously, `__amdgpu_texture_t` was lowered to `ptr addrspace(0)`
(64-bit flat pointer), which caused the double load and indirection.
With the same reproducer like #187697.

```c
#define TSHARP __constant uint *

// Old tsharp handling:
// #define LOAD_TSHARP(I) *(__constant uint8 *)I

#define LOAD_TSHARP(I) *(__constant __amdgpu_texture_t *)I

float4 test_image_load_1D(TSHARP i, int c) {
  return __builtin_amdgcn_image_load_1d_v4f32_i32(15, c, LOAD_TSHARP(i), 0, 0);
}
```
old output: 

```llvm
define hidden <4 x float> @test_image_load_1D(ptr addrspace(4) noundef readonly captures(none) %i, i32 noundef %c) local_unnamed_addr #0 {
entry:
  %0 = load ptr, ptr addrspace(4) %i, align 32, !tbaa !9
  %1 = addrspacecast ptr %0 to ptr addrspace(1)
  %tex.rsrc.val = load <8 x i32>, ptr addrspace(1) %1, align 32
  %2 = tail call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32.v8i32(i32 15, i32 %c, <8 x i32> %tex.rsrc.val, i32 0, i32 0)
  ret <4 x float> %2
}
```
This matches the old `__constant uint8 *` behavior. With this fix new
output is
```llvm
define hidden <4 x float> @test_image_load_1D(ptr addrspace(4) noundef readonly captures(none) %0, i32 noundef %1) local_unnamed_addr #0 {
  %3 = load <8 x i32>, ptr addrspace(4) %0, align 32, !tbaa !10
  %4 = tail call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32.v8i32(i32 15, i32 %1, <8 x i32> %3, i32 0, i32 0)
  ret <4 x float> %4
}
```

Fixes #187697
2026-03-21 22:21:12 +05:30
CarolineConcatto
d96722b660
[LLVM] Improve IR parsing and printing for target memory locations (#176968)
This patch adds support for specifying all target memory locations using
a
single IR spellings such as:
```
memory(target_mem: read)
```

This form is not supported in TableGen, but it is now accepted by the IR
parser.
When the parser encounters target_mem, it expands it to all
target-memory
locations (e.g., target_mem0, target_mem1, …).

Printing behavior

When all target-memory locations share the same ModRef value, the
printer
now collapses them into a single entry:
```
memory(target_mem: read)
```
Otherwise, each target memory location is printed separately.

Rejected IR:
```
memory(target_mem0: write, target_mem: read)
```
This is invalid because the default access kind for the target memory
group
must appear first.
2026-03-19 17:29:54 +00:00
Wael Yehia
495c518b96
[FMV][AIX] Implement target_clones (cpu-only) (#177428)
This PR implements Function Multi-versioning on AIX using `__attribute__
((target_clones(<feature-list>)))`.
Initially, we will only support specifying a cpu in the version list. 
Feature strings (such as "altivec" or "isel") on target_clones will be
implemented in a future PR.

Accepted syntax:
```
__attribute__((target_clones(<OPTIONS>)))
```
where `<OPTIONS>` is a comma separated list of strings, each string is
either:
1) the default string `"default"`
2) a cpu string `"cpu=<CPU>"`, where `<CPU>`is a value accepted by the
`-mcpu` flag.
For example, specifying the following on a function
```
__attribute__((target_clones("default", "cpu=power8", "cpu=power9")))
int foo(int x) { return x + 1; }
```
Would generate 3 versions of `foo`: (1) `foo.default`, (2)
`foo.cpu_power8`, and (3) `foo.cpu_power9`,
an IFUNC `foo`, and the resolver function `foo.resolver`, for the IFUNC,
that chooses one of the three versions at runtime.

---------

Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
2026-03-17 23:15:15 -04:00
Justin Stitt
18c8b8d81d
[Clang] Add __ob_trap support for implicit integer sign change (#185772)
The `__ob_trap` type specifier can be used to trap (or warn with sanitizers) when overflow or truncation occurs on the specified type.

There was a gap in coverage for this with the `-fsanitize=implicit-integer-sign-change` sanitizer. Fix this by carrying around `__ob_trap` information through `EmitIntegerSignChange()` which allows us to properly trap or warn.
2026-03-17 11:28:53 -07:00
albertbolt1
6e17b2ef33
[CIR][AArch64] Upstream NEON shift left builtins (#186406)
This PR adds CIR generation for the following AArch64 NEON builtins:

__builtin_neon_vshld_n_s64 and __builtin_neon_vshld_n_u64 (constant
shifts)

extracted the constant value and use it directly for shift left

__builtin_neon_vshld_s64  and __builtin_neon_vshld_u64 (variable shifts)
there is an existing function to handles SISD (SIngle Instruction Single
Data), reusing this to create the right CIR instructions


__builtin_neon_vshld_s64 -- call i64 @llvm.aarch64.neon.sshl.i64(i64
[[A]], i64 [[B]])
__builtin_neon_vshld_u64 -- call i64 @llvm.aarch64.neon.ushl.i64(i64
[[A]], i64 [[B]])

added test cases in intrinsics.c by looking at the test cases present in

https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon-shifts.c

before adding the code it gave a not implemented error and after adding
the code changes the error is not present the code succeeds.


ran the test cases using 
```
bin/llvm-lit -v  \
  /Users/albertbolt/projects/llvm-project/clang/test/CodeGen/AArch64/neon/intrinsics.c
```

#185382

---------

Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
2026-03-17 13:43:14 +00:00
Andrzej Warzyński
0fa9a7797b
[Clang][AArch64] Update comments in tests (nfc) (#186885) 2026-03-16 21:21:34 +00:00
Jiahao Guo
04797bc692
[CIR][AArch64] Lower BF16 vduph lane builtins (#185852)
Part of #185382.

Lower `__builtin_neon_vduph_lane_bf16` and
`__builtin_neon_vduph_laneq_bf16` in ClangIR to `cir.vec.extract`,
and add dedicated AArch64 Neon BF16 tests.

This is my first LLVM PR, so I'd really appreciate any suggestions on
the implementation, test structure, or general LLVM contribution style.
2026-03-16 16:50:41 +00:00
Andrzej Warzyński
a1054ec627
[clang][AArch64] Update label in test (nfc) (#186759) 2026-03-16 11:22:56 +00:00
Henrich Lauko
3bc216c29c
[CIR] Split CIR_UnaryOp into individual operations (#185280)
Split the monolithic cir.unary operation (which dispatched on a
UnaryOpKind enum) into four separate operations: cir.inc, cir.dec,
cir.minus, and cir.not.

Changes:
- Add CIR_UnaryOpInterface with getInput()/getResult() methods
- Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes
- Define IncOp, DecOp, MinusOp, NotOp with per-op folds
- Add Involution trait to NotOp for not(not(x)) -> x folding
- Replace createUnaryOp() with createInc/Dec/Minus/Not builders
- Split LLVM lowering into four separate patterns
- Split LoweringPrepare complex-type handling per unary op
- Update CIRCanonicalize and CIRSimplify for new op types
- Update all codegen files to use bool params instead of UnaryOpKind
- Remove CIR_UnaryOpKind enum and old CIR_UnaryOp definition

Assembly format change:
  cir.unary(inc, %x) nsw : !s32i, !s32i  ->  cir.inc nsw %x : !s32i
  cir.unary(not, %x) : !u32i, !u32i      ->  cir.not %x : !u32i
2026-03-14 23:50:43 +01:00
Prabhu Rajasekaran
60669c1cfe
Fix callee type generation (#186272)
The callee_type metadata is expected to be a list of generalized type
metadata by the IR verifier. But for indirect calls with internal
linkage the type metadata is just an integer. Avoid including them in
callee_type metadata.

This will reduce the precision of the generated call graph as the edges to internal linkage functions whose address were taken will not be present anymore. We need to handle this in the future.
2026-03-13 16:24:22 -07:00
Lei Huang
c3a13616c6
XFAIL on AIX: clang/test/CodeGen/distributed-thin-lto/pass-plugin.ll (#186452) 2026-03-13 13:04:31 -04:00
Lei Huang
097e786016
XFAIL clang/test/CodeGen/distributed-thin-lto/pass-plugin.ll (#186425)
Failing on AIX as it can't find the new symbol in the exported list.
XFAIL to bring the bots green while we investigate.

Test introduced in: https://github.com/llvm/llvm-project/pull/183525
2026-03-13 11:50:13 -04:00
paperchalice
26ac669101
[LLVM] Remove "no-nans-fp-math" attribute support (#186285)
Now all `NoNaNsFPMath` uses have been removed, remove this attribute.
2026-03-13 09:29:28 +00:00
Justin Stitt
6c35a6736c
[Clang] Check sanitizer ignorelist for divrem overflow (#185721)
Instrumentation emitted for overflow by division was not checking with the sanitizer case list's type entries.

The original type-based ignorelist support (#107332) added `isTypeIgnoredBySanitizer` calls to `CanElideOverflowCheck`, which covers `+`, `-`, `*`, `++`, `--`. However, division and remainder have a separate code path in `EmitUndefinedBehaviorIntegerDivAndRemCheck` that never calls `CanElideOverflowCheck` or checks the ignorelist directly.

Add a check so that the SCL is honored for the div/rem case.
2026-03-12 14:42:32 -07:00
Lei Huang
bf85f52fbd
[PowerPC] Update dmr builtin names (#183160)
Remove `_mma` from the following built-ins as they are not related to
MMA:

* __builtin_mma_dmsetdmrz
* __builtin_mma_dmmr
* __builtin_mma_dmxor
* __builtin_mma_build_dmr
* __builtin_mma_disassemble_dmr

AI Assisted.
2026-03-12 12:54:07 -04:00
Artem Belevich
595b961400
[CUDA] Use monotonic ordering for __nvvm_atom* builtins (#185822)
CUDA's __nvvm_atom* builtins are expected to produce atomic operations
with relaxed ordering. However, Clang lowered tham as atomicrmw and cmpxchg
with the default seq_cst ordering. That mismatch went unnoticed because
until recently NVPTX back end was unable to lower all atomic instructions correctly,
and despite using `cst_seq` ordering in IR we ended up generating the intended
PTX instructions with relaxed ordering, It worked well enough until
https://github.com/llvm/llvm-project/pull/179553 implemented correct NVPTX
atomic lowering.
That, in turn, caused severe performance regression for the code that
relied on these builtins.

Thanks to @akshayrdeodhar for figuring out what happened.

Switching __nvvm_atom* builtins to generate atomic instructions with
monotonic ordering matches the expected semantics of the builtins,
and restores performance of the generated code.

See:
https://github.com/llvm/llvm-project/pull/179553#issuecomment-4035193968
2026-03-12 09:48:09 -07:00
Andrzej Warzyński
5e887716b0
[Clang][AArch64] Remove duplicate CodeGen test for bf16 get/set intrinsics (#186084)
The following test files contain identical test bodies (aside from the
RUN lines):

  * clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c
  * clang/test/CodeGen/arm-bf16-getset-intrinsics.c

The differences in the RUN lines do not appear to be relevant for the
tested functionality. This change keeps a single test file and
simplifies its RUN lines to match the generic style used in
clang/test/CodeGen/AArch64/neon.

This also moves toward unifying and reusing RUN lines across tests.
2026-03-12 13:00:28 +00:00
Matt Arsenault
7cb3005ba2
AMDGPU: Add dereferenceable attribute to dispatch ptr intrinsic (#185955)
Stop manually setting it on the callsite in clang.
2026-03-12 07:28:39 +01:00