9758 Commits

Author SHA1 Message Date
Austin Schuh
2abcdd8cf0
[CUDA] Add support for CUDA surfaces (#132883)
This adds support for all the surface read and write calls to clang. It
extends the pattern used for textures to surfaces too.

I tested this by generating all the various permutations of the calls
and argument types in a python script, compiling them with both clang
and nvcc, and comparing the generated ptx for equivilence. They all
agree, ignoring register allocation, and some places where Clang picks
different memory write instructions. An example kernel is:

```
__global__ void testKernel(cudaSurfaceObject_t surfObj, int x, float2* result) {
    *result = surf1Dread<float2>(surfObj, x, cudaBoundaryModeZero);
}
```

---------

Signed-off-by: Austin Schuh <austin.linux@gmail.com>
2025-04-03 10:08:02 -07:00
Sami Tolvanen
acc6bcdc50
Support alternative sections for patchable function entries (#131230)
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.

The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.

Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:

https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
2025-04-02 21:53:55 +00:00
Juan Manuel Martinez Caamaño
beae0e9f1a
[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.

This feature is only available on gfx9/10.

A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
2025-04-02 20:00:09 +02:00
yonghong-song
f99072bd8c
[Clang][BPF] Add tests for btf_type_tag c2x-style attributes (#133666)
For btf_type_tag implementation, in order to have the same results with
clang (__attribute__((btf_type_tag("...")))), gcc intends to use c2x
syntax '[[...]]'. Clang also supports similar c2x syntax. Currently, the
clang selftest contains the following five tests:
```
  attr-btf_type_tag-func.c
  attr-btf_type_tag-similar-type.c
  attr-btf_type_tag-var.c
  attr-btf_type_tag-func-ptr.c
  attr-btf_type_tag-typedef-field.c
```

Tests attr-btf_type_tag-func.c and attr-btf_type_tag-var.c already have
c2x syntax test.

Test attr-btf_type_tag-func-ptr.c does not support c2x syntax when
'__attribute__((...))' is replaced with with '[[...]]'. This should not
be an issue since we do not have use cases for function pointer yet.

This patch added '[[...]]' syntax for
```
  attr-btf_type_tag-similar-type.c
  attr-btf_type_tag-typedef-field.c
```
2025-04-02 07:31:32 -07:00
Maxim Zhukov
2b7daaf967
[sanitizer][CFI] Add support to build CFI with sanitize-coverage (#131296)
Added ability to build together with -fsanitize=cfi and
-fsanitize-coverage=trace-cmp at the same time.
2025-04-02 16:05:44 +03:00
Virginia Cangelosi
79487757b7
[Clang][LLVM] Implement multi-multi vectors MOP4{A/S} (#129230)
Implement all multi-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files
2025-04-01 19:20:27 +01:00
Jonathan Thackray
558ce50ebc
[Clang][LLVM] Implement multi-single vectors MOP4{A/S} (#129226)
Implement all multi-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and
llvm following the ACLE in https://github.com/ARM-software/acle/pull/381/files
2025-04-01 17:04:59 +01:00
Zahira Ammarguellat
aa73124e51
Fix complex long double division with -mno-x87. (#133152)
The combination of `-fcomplex-arithmetic=promoted` and `mno-x87` for
`double` complex division is leading to a crash.
See https://godbolt.org/z/189G957oY
This patch fixes that.
2025-04-01 11:10:51 -04:00
Virginia Cangelosi
e92ff64bad
[Clang][LLVM] Implement single-multi vectors MOP4{A/S} (#128854)
Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files.

This PR depends on https://github.com/llvm/llvm-project/pull/127797

This patch updates the semantics of template arguments in intrinsic
names for clarity and ease of use. Previously, template argument numbers
indicated which character in the prototype string determined the final
type suffix, which was confusing—especially for intrinsics using
multiple prototype modifiers per operand (e.g., intrinsics operating on
arrays of vectors). The number had to reference the correct character in
the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and
error-prone.
With this patch, template argument numbers now refer to the operand
number that determines the final type suffix, providing a more intuitive
and consistent approach.
2025-04-01 15:05:30 +01:00
Virginia Cangelosi
6892d54286
[Clang][LLVM] Implement single-single vectors MOP4{A/S} (#127797)
Implement all single-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files
2025-04-01 13:35:09 +01:00
Lukacma
518102f259
Fix test failures caused by #127043 (#133895) 2025-04-01 11:42:22 +01:00
Lukacma
6c3adaafe3
[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate (#127043)
Currently arm_neon.h emits C-style casts to do vector type casts. This
relies on implicit conversion between vector types to be enabled, which
is currently deprecated behaviour and soon will disappear. To ensure
NEON code will keep working afterwards, this patch changes all this
vector type casts into bitcasts.


Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
2025-04-01 09:45:16 +01:00
Alan Zhao
c5b3fe2094
[clang] Automatically add the returns_twice attribute to certain functions even if -fno-builtin is set (#133511)
Certain functions require the `returns_twice` attribute in order to
produce correct codegen. However, `-fno-builtin` removes all knowledge
of functions that require this attribute, so this PR modifies Clang to
add the `returns_twice` attribute even if `-fno-builtin` is set. This
behavior is also consistent with what GCC does.

It's not (easily) possible to get the builtin information from
`Builtins.td` because `-fno-builtin` causes Clang to never initialize
any builtins, so functions never get tokenized as functions/builtins
that require `returns_twice`. Therefore, the most straightforward
solution is to explicitly hard code the function names that require
`returns_twice`.

Fixes #122840
2025-03-31 09:42:34 -07:00
Florian Mayer
c0952a931c [clang] [sanitizer] add pseudofunction to indicate array-bounds check (#128977)
With this, we can:

* use profilers to estimate how many cycles we spend on these checks
(subject to caveats),
* more easily see why we crashed.
2025-03-28 13:21:03 -07:00
Joseph Huber
772173f548
[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
2025-03-28 07:35:16 -05:00
Mallikarjuna Gouda
1318a7bb09
Reland [MIPS] Define SubTargetFeature for i6500 cpu (#132907) (#133366)
Relands #132907 with a fix in the testcase:
clang/test/CodeGen/Mips/subtarget-feature-test.c
enable this test for only mips64 target

PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which
resulted into following warning when -mcpu=i6500 was used:

+i6500' is not a recognized feature for this target (ignoring feature)

This PR fixes above issue by defining separate SubTargetFeature for
i6500.
2025-03-28 09:49:38 +01:00
Djordje Todorovic
58a0c9570c
Revert "[MIPS] Define SubTargetFeature for i6500 cpu" (#133215)
Reverts llvm/llvm-project#132907 due to some test failures.
2025-03-27 09:06:02 +01:00
Mallikarjuna Gouda
6294325a53
[MIPS] Define SubTargetFeature for i6500 cpu (#132907)
PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which
resulted into following warning when -mcpu=i6500 was used:

+i6500' is not a recognized feature for this target (ignoring feature)

This PR fixes above issue by defining separate SubTargetFeature for
i6500.
2025-03-27 08:48:34 +01:00
Alexandros Lamprineas
cd3798d7ef
[FMV][AArch64] Add feature CSSC and detect on linux platform. (#132727)
Also removes priority bits for unused features predres and ls64.

Added to ACLE with https://github.com/ARM-software/acle/pull/390
2025-03-26 08:40:29 +00:00
Mészáros Gergely
a8588d8b2a
[CodeGen][NFC] Run SROA on complex range tests (#131925)
... to make them shorter and easier to read. This removes ~2000 lines of
cruft.
2025-03-26 06:07:41 +01:00
Brandon Wu
f6417f17ba
[clang][RISCV] Fix RUN line and rename test name for pr129995 (#132676)
ninja check-clang can not detect .cc suffix, so the typo is not
detected.
2025-03-26 08:41:43 +08:00
Alexandros Lamprineas
bf2d30e092
[NFC][FMV][AArch64] Tidy up codegen tests. (#132273)
Removes attr-target-version.c which doesn't have a clear purpose.
Introduces AArch64/fmv-detection.c to check detection bitmasks.
Adds coverage in AArch64/fmv-resolver-emission.c
2025-03-24 11:39:51 +00:00
Jesse Huang
20b5728b7b
[RISCV] Implement the implications of C extension (#132259)
Implement the following implications according to the [Zc
spec](https://github.com/riscvarchive/riscv-code-size-reduction/blob/main/Zc-specification/Zc.adoc#13-c)

> As C defines the same instructions as Zca, Zcf and Zcd, the rule is
that:
> * C always implies Zca
> * C+F implies Zcf (RV32 only)
> * C+D implies Zcd
2025-03-22 14:48:52 +08:00
Ben Shi
597accfea6
[clang][CodeGen][AVR] Fix a crash in AVRABIInfo (#131976)
fixes https://github.com/llvm/llvm-project/issues/131967
2025-03-22 13:22:32 +08:00
Phoebe Wang
df4257b038
[X86][AVX10.2] Remove YMM rounding from VCVT[,T]PS2I[,U]BS (#132426)
Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
2025-03-22 08:42:22 +08:00
Shilei Tian
f1ac2afe21
Reapply "[AMDGPU] Use COV6 by default (#118515)" (#130963)
This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.
2025-03-21 15:26:45 -04:00
Phoebe Wang
e1a16033dc
[X86][AVX10.2] Remove YMM rounding from VCVTTP.*QS (#132414)
Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
2025-03-22 01:10:39 +08:00
Phoebe Wang
d7e7e0af48
[X86][AVX10.2] Remove YMM rounding from VMINMAXP[H,S,D] (#132405)
Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
2025-03-22 00:56:23 +08:00
Phoebe Wang
924c7ea76a
[X86][AVX10.2] Remove YMM rounding from VCVT2PS2PHX (#132397)
Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
2025-03-21 22:51:51 +08:00
Phoebe Wang
09feaa9261
Revert "[X86][AVX10.2] Support YMM rounding new instructions (#101825)" (#132362)
This reverts commit 0dba5381d8c8e4cadc32a067bf2fe5e3486ae53d.

YMM rounding was removed from AVX10 whitepaper. Ref:
https://cdrdv2.intel.com/v1/dl/getContent/784343

The MINMAX and SATURATING CONVERT instructions will be removed as a
follow up.
2025-03-21 20:12:57 +08:00
Phoebe Wang
19d2023a66
[X86][AVX10.2] Use 's_' for saturate-convert intrinsics (#131592)
- Add '_' after cvt[t]s intrinsics when 's' is for saturation;
- Add 's_' for all ipcvt[t] intrinsics since they are all saturation
ones;
- Move 's' after 'cvt' and add '_' after it for prior `biass`
intrinsics;

This is to solve potential confusion since 's' before a type usually
represents for scalar.

Synced with GCC folks and they will change in the same way.
2025-03-21 11:00:51 +08:00
Ricardo Jesus
74f5a028cb
Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#129732)" (#130625)
The original patch from #129732 exposed a bug in `getMemVTFromNode`, which was returning incorrect types for fixed length vectors.
2025-03-19 08:25:37 +00:00
Mészáros Gergely
f017073cd8
[Clang][CodeGen] Promote in complex compound divassign (#131453)
When `-fcomplex-arithmetic=promoted` is set complex divassign `/=` should
promote to a wider type the same way division (without assignment) does.
Prior to this change, Smith's algorithm would be used for divassign.

Fixes: https://github.com/llvm/llvm-project/issues/131129
2025-03-19 07:29:45 +01:00
Mészáros Gergely
1bd6716d33
[Clang][CodeGen] Do not promote if complex divisor is real (#131451)
Relates-to: https://github.com/llvm/llvm-project/issues/131129
2025-03-19 07:26:54 +01:00
Mészáros Gergely
7b00b0b758
[Clang][NFC] Extend cmplx range tests for #131129 (#131447)
- Add tests for complex divdent and real divisor
- Add tests for complex * real multiplication
- Add tests for multiply/divide and assign (`/=`,`*=`) operators
2025-03-19 06:24:02 +01:00
Matt Arsenault
6f44be97d0
IR: Make llvm.fake.use a DefaultAttrsIntrinsic (#131743)
This shouldn't be special and is just an ordinary sideeffect.
2025-03-19 08:29:04 +07:00
Pedro Lobo
98943c4bd8
[ARM,MVE] Change placeholder from undef to poison (#131689)
Call `insertelement` on a `poison` value instead of `undef`.
2025-03-18 22:37:46 +00:00
Aaron Ballman
d781ac1cf0
[C23] Add __builtin_c23_va_start (#131166)
This builtin is supported by GCC and is a way to improve diagnostic
behavior for va_start in C23 mode. C23 no longer requires a second
argument to the va_start macro in support of variadic functions with no
leading parameters. However, we still want to diagnose passing more than
two arguments, or diagnose when passing something other than the last
parameter in the variadic function.

This also updates the freestanding <stdarg.h> header to use the new
builtin, same as how GCC works.

Fixes #124031
2025-03-15 11:01:53 -04:00
Brandon Wu
8727097ffd
[RISCV][Sema] Add feature check for target attribute to VSETVL intrinsics (#126064)
This fixes the target attribute issue for vsetvl and vsetvlmax
intrinsics.
Fixes #125154
2025-03-14 13:36:47 +08:00
Veera
5073b5fdfa
[CVP] Infer nuw/nsw flags for TruncInst (#130504)
Proof: https://alive2.llvm.org/ce/z/U-G7yV

Helps: https://github.com/rust-lang/rust/issues/72646 and
https://github.com/rust-lang/rust/issues/122734

  Rust compiler's current output: https://godbolt.org/z/7E3fET6Md

IPSCCP can do this transform but it does not help the motivating issue
since it runs only once early in the optimization pipeline.

Reimplementing this in CVP folds the motivating issue into a simple
`icmp eq` instruction.
  
  Fixes #130100
2025-03-12 08:25:24 -04:00
Juan Manuel Martinez Caamaño
7decd04626
[Clang] Add __builtin_elementwise_exp10 in the same fashion as exp/exp2 (#130746)
Clang has __builtin_elementwise_exp and __builtin_elementwise_exp2
intrinsics, but no __builtin_elementwise_exp10.

There doesn't seem to be a good reason not to expose the exp10 flavour
of this intrinsic too.

This commit introduces this intrinsic following the same pattern as the
exp and exp2 versions.

Fixes: SWDEV-519541
2025-03-12 09:20:29 +01:00
Younan Zhang
c12761858c
[Clang] Fix the printout of CXXParenListInitExpr involving default arguments (#130731)
The parantheses are unnecessary IMO because they should have been
handled in the parents of such expressions, e.g. in CXXFunctionalCastExpr.

Moreover, we shouldn't join CXXDefaultInitExpr either because they are
not printed at all.
2025-03-12 10:39:44 +08:00
Juan Manuel Martinez Caamaño
83ec179fc8
[Clang][NFC] Rename and update_cc_test_checks over strictfp-elementwise-builtins.cpp (#130747) 2025-03-11 17:16:32 +01:00
Benjamin Maxwell
fb397ab1e5
Reland "[clang] Lower modf builtin using llvm.modf intrinsic" (#130761)
Reverts
c40f0fe434

Original description:
This updates the existing modf[f|l] builtin to be lowered via the
llvm.modf.* intrinsic (rather than directly to a library call).

The Windows 32-bit x86 missing `modff` symbol issue should have been
solved in: https://github.com/llvm/llvm-project/pull/130636.
2025-03-11 14:55:33 +00:00
Younan Zhang
f4218753ad
[Clang] Implement P0963R3 "Structured binding declaration as a condition" (#130228)
This implements the R2 semantics of P0963.

The R1 semantics, as outlined in the paper, were introduced in Clang 6.
In addition to that, the paper proposes swapping the evaluation order of
condition expressions and the initialization of binding declarations
(i.e. std::tuple-like decompositions).
2025-03-11 15:41:56 +08:00
Hans Wennborg
c40f0fe434 Revert "Reland "[clang] Lower modf builtin using llvm.modf intrinsic" (#129885)"
This broke modff calls on 32-bit x86 Windows. See comment on the PR.

> This updates the existing modf[f|l] builtin to be lowered via the
> llvm.modf.* intrinsic (rather than directly to a library call).
>
> The legalization issues exposed by the original PR (#126750) should have
> been fixed in #128055 and #129264.

This reverts commit cd1d9a8fab05524a27ffdb251f6def37786b5cc1.
2025-03-10 16:35:03 +01:00
Benson Chu
3b3356043c Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute"
This reverts commit 1f05703176d43a339b41a474f51c0e8b1a83c9bb.
2025-03-10 10:11:23 -05:00
Benson Chu
1f05703176 [ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute
FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.

- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2

FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.

DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.

Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup.  Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.

Co-authored-by: Jake Vossen <jake@vossen.dev>
Co-authored-by: Alan Phipps <a-phipps@ti.com>
2025-03-10 10:05:15 -05:00
Csanád Hajdú
c579ec66c7
[Clang][AArch64] Add support for SHF_AARCH64_PURECODE ELF section flag (2/3) (#125688)
Add support for the new SHF_AARCH64_PURECODE ELF section flag:
https://github.com/ARM-software/abi-aa/pull/304

The general implementation follows the existing one for ARM targets.
Simlarly to ARM targets, generating object files with the
`SHF_AARCH64_PURECODE` flag set is enabled by the
`-mexecute-only`/`-mpure-code` driver flag.

Related PRs:
* LLVM: https://github.com/llvm/llvm-project/pull/125687
* LLD: https://github.com/llvm/llvm-project/pull/125689
2025-03-10 09:26:53 +00:00
Timm Baeder
d08cf7900d
[clang][bytecode] Implement __builtin_constant_p (#130143)
Use the regular code paths for interpreting.

Add new instructions: `StartSpeculation` will reset the diagnostics
pointers to `nullptr`, which will keep us from reporting any diagnostics
during speculation. `EndSpeculation` will undo this.

The rest depends on what `Emitter` we use.

For `EvalEmitter`, we have no bytecode, so we implement `speculate()` by
simply visiting the first argument of `__builtin_constant_p`. If the
evaluation fails, we push a `0` on the stack, otherwise a `1`.

For `ByteCodeEmitter`, add another instrucion called `BCP`, that
interprets all the instructions following it until the next
`EndSpeculation` instruction. If any of those instructions fails, we
jump to the `EndLabel`, which brings us right before the
`EndSpeculation`. We then push the result on the stack.
2025-03-08 06:06:14 +01:00