llvm-project

Author	SHA1	Message	Date
Austin Schuh	2abcdd8cf0	[CUDA] Add support for CUDA surfaces (#132883 ) This adds support for all the surface read and write calls to clang. It extends the pattern used for textures to surfaces too. I tested this by generating all the various permutations of the calls and argument types in a python script, compiling them with both clang and nvcc, and comparing the generated ptx for equivilence. They all agree, ignoring register allocation, and some places where Clang picks different memory write instructions. An example kernel is: ``` __global__ void testKernel(cudaSurfaceObject_t surfObj, int x, float2* result) { *result = surf1Dread<float2>(surfObj, x, cudaBoundaryModeZero); } ``` --------- Signed-off-by: Austin Schuh <austin.linux@gmail.com>	2025-04-03 10:08:02 -07:00
Sami Tolvanen	acc6bcdc50	Support alternative sections for patchable function entries (#131230 ) With -fpatchable-function-entry (or the patchable_function_entry function attribute), we emit records of patchable entry locations to the __patchable_function_entries section. Add an additional parameter to the command line option that allows one to specify a different default section name for the records, and an identical parameter to the function attribute that allows one to override the section used. The main use case for this change is the Linux kernel using prefix NOPs for ftrace, and thus depending on__patchable_function_entries to locate traceable functions. Functions that are not traceable currently disable entry NOPs using the function attribute, but this creates a compatibility issue with -fsanitize=kcfi, which expects all indirectly callable functions to have a type hash prefix at the same offset from the function entry. Adding a section parameter would allow the kernel to distinguish between traceable and non-traceable functions by adding entry records to separate sections while maintaining a stable function prefix layout for all functions. LKML discussion: https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/	2025-04-02 21:53:55 +00:00
Juan Manuel Martinez Caamaño	beae0e9f1a	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055 ) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.	2025-04-02 20:00:09 +02:00
yonghong-song	f99072bd8c	[Clang][BPF] Add tests for btf_type_tag c2x-style attributes (#133666 ) For btf_type_tag implementation, in order to have the same results with clang (__attribute__((btf_type_tag("...")))), gcc intends to use c2x syntax '[[...]]'. Clang also supports similar c2x syntax. Currently, the clang selftest contains the following five tests: ``` attr-btf_type_tag-func.c attr-btf_type_tag-similar-type.c attr-btf_type_tag-var.c attr-btf_type_tag-func-ptr.c attr-btf_type_tag-typedef-field.c ``` Tests attr-btf_type_tag-func.c and attr-btf_type_tag-var.c already have c2x syntax test. Test attr-btf_type_tag-func-ptr.c does not support c2x syntax when '__attribute__((...))' is replaced with with '[[...]]'. This should not be an issue since we do not have use cases for function pointer yet. This patch added '[[...]]' syntax for ``` attr-btf_type_tag-similar-type.c attr-btf_type_tag-typedef-field.c ```	2025-04-02 07:31:32 -07:00
Maxim Zhukov	2b7daaf967	[sanitizer][CFI] Add support to build CFI with sanitize-coverage (#131296 ) Added ability to build together with -fsanitize=cfi and -fsanitize-coverage=trace-cmp at the same time.	2025-04-02 16:05:44 +03:00
Virginia Cangelosi	79487757b7	[Clang][LLVM] Implement multi-multi vectors MOP4{A/S} (#129230 ) Implement all multi-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 19:20:27 +01:00
Jonathan Thackray	558ce50ebc	[Clang][LLVM] Implement multi-single vectors MOP4{A/S} (#129226 ) Implement all multi-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the ACLE in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 17:04:59 +01:00
Zahira Ammarguellat	aa73124e51	Fix complex long double division with -mno-x87. (#133152 ) The combination of `-fcomplex-arithmetic=promoted` and `mno-x87` for `double` complex division is leading to a crash. See https://godbolt.org/z/189G957oY This patch fixes that.	2025-04-01 11:10:51 -04:00
Virginia Cangelosi	e92ff64bad	[Clang][LLVM] Implement single-multi vectors MOP4{A/S} (#128854 ) Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files. This PR depends on https://github.com/llvm/llvm-project/pull/127797 This patch updates the semantics of template arguments in intrinsic names for clarity and ease of use. Previously, template argument numbers indicated which character in the prototype string determined the final type suffix, which was confusing—especially for intrinsics using multiple prototype modifiers per operand (e.g., intrinsics operating on arrays of vectors). The number had to reference the correct character in the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and error-prone. With this patch, template argument numbers now refer to the operand number that determines the final type suffix, providing a more intuitive and consistent approach.	2025-04-01 15:05:30 +01:00
Virginia Cangelosi	6892d54286	[Clang][LLVM] Implement single-single vectors MOP4{A/S} (#127797 ) Implement all single-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 13:35:09 +01:00
Lukacma	518102f259	Fix test failures caused by #127043 (#133895 )	2025-04-01 11:42:22 +01:00
Lukacma	6c3adaafe3	[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate (#127043 ) Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts. Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>	2025-04-01 09:45:16 +01:00
Alan Zhao	c5b3fe2094	[clang] Automatically add the `returns_twice` attribute to certain functions even if `-fno-builtin` is set (#133511 ) Certain functions require the `returns_twice` attribute in order to produce correct codegen. However, `-fno-builtin` removes all knowledge of functions that require this attribute, so this PR modifies Clang to add the `returns_twice` attribute even if `-fno-builtin` is set. This behavior is also consistent with what GCC does. It's not (easily) possible to get the builtin information from `Builtins.td` because `-fno-builtin` causes Clang to never initialize any builtins, so functions never get tokenized as functions/builtins that require `returns_twice`. Therefore, the most straightforward solution is to explicitly hard code the function names that require `returns_twice`. Fixes #122840	2025-03-31 09:42:34 -07:00
Florian Mayer	c0952a931c	[clang] [sanitizer] add pseudofunction to indicate array-bounds check (#128977 ) With this, we can: * use profilers to estimate how many cycles we spend on these checks (subject to caveats), * more easily see why we crashed.	2025-03-28 13:21:03 -07:00
Joseph Huber	772173f548	[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870 ) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.	2025-03-28 07:35:16 -05:00
Mallikarjuna Gouda	1318a7bb09	Reland [MIPS] Define SubTargetFeature for i6500 cpu (#132907 ) (#133366 ) Relands #132907 with a fix in the testcase: clang/test/CodeGen/Mips/subtarget-feature-test.c enable this test for only mips64 target PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which resulted into following warning when -mcpu=i6500 was used: +i6500' is not a recognized feature for this target (ignoring feature) This PR fixes above issue by defining separate SubTargetFeature for i6500.	2025-03-28 09:49:38 +01:00
Djordje Todorovic	58a0c9570c	Revert "[MIPS] Define SubTargetFeature for i6500 cpu" (#133215 ) Reverts llvm/llvm-project#132907 due to some test failures.	2025-03-27 09:06:02 +01:00
Mallikarjuna Gouda	6294325a53	[MIPS] Define SubTargetFeature for i6500 cpu (#132907 ) PR #130587 defined same SubTargetFeature for CPUs i6400 and i6500 which resulted into following warning when -mcpu=i6500 was used: +i6500' is not a recognized feature for this target (ignoring feature) This PR fixes above issue by defining separate SubTargetFeature for i6500.	2025-03-27 08:48:34 +01:00
Alexandros Lamprineas	cd3798d7ef	[FMV][AArch64] Add feature CSSC and detect on linux platform. (#132727 ) Also removes priority bits for unused features predres and ls64. Added to ACLE with https://github.com/ARM-software/acle/pull/390	2025-03-26 08:40:29 +00:00
Mészáros Gergely	a8588d8b2a	[CodeGen][NFC] Run SROA on complex range tests (#131925 ) ... to make them shorter and easier to read. This removes ~2000 lines of cruft.	2025-03-26 06:07:41 +01:00
Brandon Wu	f6417f17ba	[clang][RISCV] Fix RUN line and rename test name for pr129995 (#132676 ) ninja check-clang can not detect .cc suffix, so the typo is not detected.	2025-03-26 08:41:43 +08:00
Alexandros Lamprineas	bf2d30e092	[NFC][FMV][AArch64] Tidy up codegen tests. (#132273 ) Removes attr-target-version.c which doesn't have a clear purpose. Introduces AArch64/fmv-detection.c to check detection bitmasks. Adds coverage in AArch64/fmv-resolver-emission.c	2025-03-24 11:39:51 +00:00
Jesse Huang	20b5728b7b	[RISCV] Implement the implications of C extension (#132259 ) Implement the following implications according to the [Zc spec](https://github.com/riscvarchive/riscv-code-size-reduction/blob/main/Zc-specification/Zc.adoc#13-c) > As C defines the same instructions as Zca, Zcf and Zcd, the rule is that: > * C always implies Zca > * C+F implies Zcf (RV32 only) > * C+D implies Zcd	2025-03-22 14:48:52 +08:00
Ben Shi	597accfea6	[clang][CodeGen][AVR] Fix a crash in AVRABIInfo (#131976 ) fixes https://github.com/llvm/llvm-project/issues/131967	2025-03-22 13:22:32 +08:00
Phoebe Wang	df4257b038	[X86][AVX10.2] Remove YMM rounding from VCVT[,T]PS2I[,U]BS (#132426 ) Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343	2025-03-22 08:42:22 +08:00
Shilei Tian	f1ac2afe21	Reapply "[AMDGPU] Use COV6 by default (#118515 )" (#130963 ) This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.	2025-03-21 15:26:45 -04:00
Phoebe Wang	e1a16033dc	[X86][AVX10.2] Remove YMM rounding from VCVTTP.*QS (#132414 ) Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343	2025-03-22 01:10:39 +08:00
Phoebe Wang	d7e7e0af48	[X86][AVX10.2] Remove YMM rounding from VMINMAXP[H,S,D] (#132405 ) Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343	2025-03-22 00:56:23 +08:00
Phoebe Wang	924c7ea76a	[X86][AVX10.2] Remove YMM rounding from VCVT2PS2PHX (#132397 ) Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343	2025-03-21 22:51:51 +08:00
Phoebe Wang	09feaa9261	Revert "[X86][AVX10.2] Support YMM rounding new instructions (#101825 )" (#132362 ) This reverts commit 0dba5381d8c8e4cadc32a067bf2fe5e3486ae53d. YMM rounding was removed from AVX10 whitepaper. Ref: https://cdrdv2.intel.com/v1/dl/getContent/784343 The MINMAX and SATURATING CONVERT instructions will be removed as a follow up.	2025-03-21 20:12:57 +08:00
Phoebe Wang	19d2023a66	[X86][AVX10.2] Use 's_' for saturate-convert intrinsics (#131592 ) - Add '_' after cvt[t]s intrinsics when 's' is for saturation; - Add 's_' for all ipcvt[t] intrinsics since they are all saturation ones; - Move 's' after 'cvt' and add '_' after it for prior `biass` intrinsics; This is to solve potential confusion since 's' before a type usually represents for scalar. Synced with GCC folks and they will change in the same way.	2025-03-21 11:00:51 +08:00
Ricardo Jesus	74f5a028cb	Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#129732 )" (#130625 ) The original patch from #129732 exposed a bug in `getMemVTFromNode`, which was returning incorrect types for fixed length vectors.	2025-03-19 08:25:37 +00:00
Mészáros Gergely	f017073cd8	[Clang][CodeGen] Promote in complex compound divassign (#131453 ) When `-fcomplex-arithmetic=promoted` is set complex divassign `/=` should promote to a wider type the same way division (without assignment) does. Prior to this change, Smith's algorithm would be used for divassign. Fixes: https://github.com/llvm/llvm-project/issues/131129	2025-03-19 07:29:45 +01:00
Mészáros Gergely	1bd6716d33	[Clang][CodeGen] Do not promote if complex divisor is real (#131451 ) Relates-to: https://github.com/llvm/llvm-project/issues/131129	2025-03-19 07:26:54 +01:00
Mészáros Gergely	7b00b0b758	[Clang][NFC] Extend cmplx range tests for #131129 (#131447 ) - Add tests for complex divdent and real divisor - Add tests for complex * real multiplication - Add tests for multiply/divide and assign (`/=`,`*=`) operators	2025-03-19 06:24:02 +01:00
Matt Arsenault	6f44be97d0	IR: Make llvm.fake.use a DefaultAttrsIntrinsic (#131743 ) This shouldn't be special and is just an ordinary sideeffect.	2025-03-19 08:29:04 +07:00
Pedro Lobo	98943c4bd8	[ARM,MVE] Change placeholder from `undef` to `poison` (#131689 ) Call `insertelement` on a `poison` value instead of `undef`.	2025-03-18 22:37:46 +00:00
Aaron Ballman	d781ac1cf0	[C23] Add __builtin_c23_va_start (#131166 ) This builtin is supported by GCC and is a way to improve diagnostic behavior for va_start in C23 mode. C23 no longer requires a second argument to the va_start macro in support of variadic functions with no leading parameters. However, we still want to diagnose passing more than two arguments, or diagnose when passing something other than the last parameter in the variadic function. This also updates the freestanding <stdarg.h> header to use the new builtin, same as how GCC works. Fixes #124031	2025-03-15 11:01:53 -04:00
Brandon Wu	8727097ffd	[RISCV][Sema] Add feature check for target attribute to VSETVL intrinsics (#126064 ) This fixes the target attribute issue for vsetvl and vsetvlmax intrinsics. Fixes #125154	2025-03-14 13:36:47 +08:00
Veera	5073b5fdfa	[CVP] Infer `nuw`/`nsw` flags for TruncInst (#130504 ) Proof: https://alive2.llvm.org/ce/z/U-G7yV Helps: https://github.com/rust-lang/rust/issues/72646 and https://github.com/rust-lang/rust/issues/122734 Rust compiler's current output: https://godbolt.org/z/7E3fET6Md IPSCCP can do this transform but it does not help the motivating issue since it runs only once early in the optimization pipeline. Reimplementing this in CVP folds the motivating issue into a simple `icmp eq` instruction. Fixes #130100	2025-03-12 08:25:24 -04:00
Juan Manuel Martinez Caamaño	7decd04626	[Clang] Add __builtin_elementwise_exp10 in the same fashion as exp/exp2 (#130746 ) Clang has __builtin_elementwise_exp and __builtin_elementwise_exp2 intrinsics, but no __builtin_elementwise_exp10. There doesn't seem to be a good reason not to expose the exp10 flavour of this intrinsic too. This commit introduces this intrinsic following the same pattern as the exp and exp2 versions. Fixes: SWDEV-519541	2025-03-12 09:20:29 +01:00
Younan Zhang	c12761858c	[Clang] Fix the printout of CXXParenListInitExpr involving default arguments (#130731 ) The parantheses are unnecessary IMO because they should have been handled in the parents of such expressions, e.g. in CXXFunctionalCastExpr. Moreover, we shouldn't join CXXDefaultInitExpr either because they are not printed at all.	2025-03-12 10:39:44 +08:00
Juan Manuel Martinez Caamaño	83ec179fc8	[Clang][NFC] Rename and update_cc_test_checks over strictfp-elementwise-builtins.cpp (#130747 )	2025-03-11 17:16:32 +01:00
Benjamin Maxwell	fb397ab1e5	Reland "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#130761 ) Reverts `c40f0fe434` Original description: This updates the existing modf[f\|l] builtin to be lowered via the llvm.modf.* intrinsic (rather than directly to a library call). The Windows 32-bit x86 missing `modff` symbol issue should have been solved in: https://github.com/llvm/llvm-project/pull/130636.	2025-03-11 14:55:33 +00:00
Younan Zhang	f4218753ad	[Clang] Implement P0963R3 "Structured binding declaration as a condition" (#130228 ) This implements the R2 semantics of P0963. The R1 semantics, as outlined in the paper, were introduced in Clang 6. In addition to that, the paper proposes swapping the evaluation order of condition expressions and the initialization of binding declarations (i.e. std::tuple-like decompositions).	2025-03-11 15:41:56 +08:00
Hans Wennborg	c40f0fe434	Revert "Reland "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#129885 )" This broke modff calls on 32-bit x86 Windows. See comment on the PR. > This updates the existing modf[f\|l] builtin to be lowered via the > llvm.modf.* intrinsic (rather than directly to a library call). > > The legalization issues exposed by the original PR (#126750) should have > been fixed in #128055 and #129264. This reverts commit cd1d9a8fab05524a27ffdb251f6def37786b5cc1.	2025-03-10 16:35:03 +01:00
Benson Chu	3b3356043c	Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute" This reverts commit 1f05703176d43a339b41a474f51c0e8b1a83c9bb.	2025-03-10 10:11:23 -05:00
Benson Chu	1f05703176	[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been saved. - GPRCS1 - GPRCS2 - FPStatusRegs (new) - DPRCS - GPRCS3 - DPRCS2 FPSCR is present on all targets with a VFP, but the FPEXC register is not present on Cortex-M devices, so different amounts of bytes are being pushed onto the stack depending on our target, which would affect alignment for subsequent saves. DPRCS1 will sum up all previous bytes that were saved, and will emit extra instructions to ensure that its alignment is correct. My assumption is that if DPRCS1 is able to correct its alignment to be correct, then all subsequent saves will also have correct alignment. Avoid annotating the saving of FPSCR and FPEXC for functions marked with the interrupt_save_fp attribute, even though this is done as part of frame setup. Since these are status registers, there really is no viable way of annotating this. Since these aren't GPRs or DPRs, they can't be used with .save or .vsave directives. Instead, just record that the intermediate registers r4 and r5 are saved to the stack again. Co-authored-by: Jake Vossen <jake@vossen.dev> Co-authored-by: Alan Phipps <a-phipps@ti.com>	2025-03-10 10:05:15 -05:00
Csanád Hajdú	c579ec66c7	[Clang][AArch64] Add support for SHF_AARCH64_PURECODE ELF section flag (2/3) (#125688 ) Add support for the new SHF_AARCH64_PURECODE ELF section flag: https://github.com/ARM-software/abi-aa/pull/304 The general implementation follows the existing one for ARM targets. Simlarly to ARM targets, generating object files with the `SHF_AARCH64_PURECODE` flag set is enabled by the `-mexecute-only`/`-mpure-code` driver flag. Related PRs: * LLVM: https://github.com/llvm/llvm-project/pull/125687 * LLD: https://github.com/llvm/llvm-project/pull/125689	2025-03-10 09:26:53 +00:00
Timm Baeder	d08cf7900d	[clang][bytecode] Implement __builtin_constant_p (#130143 ) Use the regular code paths for interpreting. Add new instructions: `StartSpeculation` will reset the diagnostics pointers to `nullptr`, which will keep us from reporting any diagnostics during speculation. `EndSpeculation` will undo this. The rest depends on what `Emitter` we use. For `EvalEmitter`, we have no bytecode, so we implement `speculate()` by simply visiting the first argument of `__builtin_constant_p`. If the evaluation fails, we push a `0` on the stack, otherwise a `1`. For `ByteCodeEmitter`, add another instrucion called `BCP`, that interprets all the instructions following it until the next `EndSpeculation` instruction. If any of those instructions fails, we jump to the `EndLabel`, which brings us right before the `EndSpeculation`. We then push the result on the stack.	2025-03-08 06:06:14 +01:00

1 2 3 4 5 ...

9758 Commits