llvm-project

Author	SHA1	Message	Date
Jie Fu	80bc38bc92	[RISCV] Silent a warning (NFC) /llvm-project/clang/lib/CodeGen/Targets/RISCV.cpp:865:9: error: unused variable 'FixedSrcTy' [-Werror,-Wunused-variable] auto *FixedSrcTy = cast<llvm::FixedVectorType>(SrcTy); ^ 1 error generated.	2025-08-20 16:59:12 +08:00
Brandon Wu	52a2e68fda	[clang][RISCV] Fix crash on VLS calling convention (#145489 ) This patch handle struct of fixed vector and struct of array of fixed vector correctly for VLS calling convention in EmitFunctionProlog, EmitFunctionEpilog and EmitCall. stack on: https://github.com/llvm/llvm-project/pull/147173	2025-08-20 16:39:02 +08:00
Nikita Popov	246a64a12e	[Clang] Rename HasLegalHalfType -> HasFastHalfType (NFC) (#153163 ) This option is confusingly named. What it actually controls is whether, under the default of `-ffloat16-excess-precision=standard`, it is beneficial for performance to perform calculations on float (without intermediate rounding) or not. For `-ffloat16-excess-precision=none` the LLVM `half` type will always be used, and all backends are expected to legalize it correctly.	2025-08-18 09:23:48 +02:00
Matheus Izvekov	91cdd35008	[clang] Improve nested name specifier AST representation (#147835 ) This is a major change on how we represent nested name qualifications in the AST. * The nested name specifier itself and how it's stored is changed. The prefixes for types are handled within the type hierarchy, which makes canonicalization for them super cheap, no memory allocation required. Also translating a type into nested name specifier form becomes a no-op. An identifier is stored as a DependentNameType. The nested name specifier gains a lightweight handle class, to be used instead of passing around pointers, which is similar to what is implemented for TemplateName. There is still one free bit available, and this handle can be used within a PointerUnion and PointerIntPair, which should keep bit-packing aficionados happy. * The ElaboratedType node is removed, all type nodes in which it could previously apply to can now store the elaborated keyword and name qualifier, tail allocating when present. * TagTypes can now point to the exact declaration found when producing these, as opposed to the previous situation of there only existing one TagType per entity. This increases the amount of type sugar retained, and can have several applications, for example in tracking module ownership, and other tools which care about source file origins, such as IWYU. These TagTypes are lazily allocated, in order to limit the increase in AST size. This patch offers a great performance benefit. It greatly improves compilation time for [stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for `test_on2.cpp` in that project, which is the slowest compiling test, this patch improves `-c` compilation time by about 7.2%, with the `-fsyntax-only` improvement being at ~12%. This has great results on compile-time-tracker as well: ![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831) This patch also further enables other optimziations in the future, and will reduce the performance impact of template specialization resugaring when that lands. It has some other miscelaneous drive-by fixes. About the review: Yes the patch is huge, sorry about that. Part of the reason is that I started by the nested name specifier part, before the ElaboratedType part, but that had a huge performance downside, as ElaboratedType is a big performance hog. I didn't have the steam to go back and change the patch after the fact. There is also a lot of internal API changes, and it made sense to remove ElaboratedType in one go, versus removing it from one type at a time, as that would present much more churn to the users. Also, the nested name specifier having a different API avoids missing changes related to how prefixes work now, which could make existing code compile but not work. How to review: The important changes are all in `clang/include/clang/AST` and `clang/lib/AST`, with also important changes in `clang/lib/Sema/TreeTransform.h`. The rest and bulk of the changes are mostly consequences of the changes in API. PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just for easier to rebasing. I plan to rename it back after this lands. Fixes #136624 Fixes https://github.com/llvm/llvm-project/issues/43179 Fixes https://github.com/llvm/llvm-project/issues/68670 Fixes https://github.com/llvm/llvm-project/issues/92757	2025-08-09 05:06:53 -03:00
Gergely Futo	1454db130a	[RISCV] Support resumable non-maskable interrupt handlers (#148134 ) The `rnmi` interrupt attribute value has been added for the `Smrnmi` extension. --------- Co-authored-by: Sam Elliott <sam@lenary.co.uk>	2025-08-04 10:54:50 +02:00
T0b1-iOS	d35931c49e	[Clang][CodeGen][X86] don't coerce int128 into `{i64,i64}` for SysV-like ABIs (#135230 ) Currently, clang coerces (u)int128_t to two i64 IR parameters when they are passed in registers. This leads to broken debug info for them after applying SROA+InstCombine. SROA generates IR like this ([godbolt](https://godbolt.org/z/YrTa4chfc)): ```llvm define dso_local { i64, i64 } @add(i64 noundef %a.coerce0, i64 noundef %a.coerce1) { entry: %a.sroa.2.0.insert.ext = zext i64 %a.coerce1 to i128 %a.sroa.2.0.insert.shift = shl nuw i128 %a.sroa.2.0.insert.ext, 64 %a.sroa.0.0.insert.ext = zext i64 %a.coerce0 to i128 %a.sroa.0.0.insert.insert = or i128 %a.sroa.2.0.insert.shift, %a.sroa.0.0.insert.ext #dbg_value(i128 %a.sroa.0.0.insert.insert, !17, !DIExpression(), !18) // ... !17 = !DILocalVariable(name: "a", arg: 1, scope: !10, file: !11, line: 1, type: !14) // ... ``` and InstCombine then removes the `or`, moving it into the `DIExpression`, and the `shl` at which point the debug info salvaging in `Transforms/Local` replaces the arguments with `poison` as it does not allow constants larger than 64 bit in `DIExpression`s. I'm working under the assumption that there is interest in fixing this. If not, please tell me. By not coercing `int128_t`s into `{i64, i64}` but keeping them as `i128`, the debug info stays intact and SelectionDAG then generates two `DW_OP_LLVM_fragment` expressions for the two corresponding argument registers. Given that the ABI code for x64 seems to not coerce the argument when it is passed on the stack, it should not lead to any problems keeping it as an `i128` when it is passed in registers. Alternatively, this could be fixed by checking if a constant value fits in 64 bits in the debug info salvaging code and then extending the value on the expression stack to the necessary width. This fixes InstCombine breaking the debug info but then SelectionDAG removes the expression and that seems significantly more complex to debug. Another fix may be to generate `DW_OP_LLVM_fragment` expressions when removing the `or` as it gets marked as disjoint by InstCombine. However, I don't know if the KnownBits information is still available at the time the `or` gets removed and it would probably require refactoring of the debug info salvaging code as that currently only seems to replace single expressions and is not designed to support generating new debug records. Converting `(u)int128_t` arguments to `i128` in the IR seems like the simpler solution, if it doesn't cause any ABI issues.	2025-07-17 09:57:32 -07:00
Brad Smith	0d2e11f3e8	Remove Native Client support (#133661 ) Remove the Native Client support now that it has finally reached end of life.	2025-07-15 13:22:33 -04:00
Sven van Haastregt	d45d20e871	[OpenCL] Remove image dimensionality comments; NFC (#147312 ) The code is correct as it aligns with the SPIR-V Specification, but the comment was incorrect.	2025-07-09 10:27:30 +02:00
Brandon Wu	6ee375147b	[RISCV] Correct type lowering of struct of fixed-vector array in VLS (#147173 ) Currently, struct of fixed-vector array is flattened and lowered to scalable vector. However only struct of 1-element-fixed-vector array should be lowered that way, struct of fixed-vector array of length >1 should be lowered to vector tuple type. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418/files#diff-3a934f00cffdb3e509722753126a2cf6082a7648ab3b9ca8cbb0e84f8a6a12edR555-R558	2025-07-08 21:14:40 -07:00
Shafik Yaghmour	6efa366b43	[Clang][NFC] Avoid copies by using std::move (#146960 ) Static analysis flagged this code as using copies when we could use move instead. I used a temporary in some cases instead of an explicit move.	2025-07-07 17:53:45 -07:00
Eli Friedman	2aa0f0a3bd	[AArch64] Add option -msve-streaming-vector-bits= . (#144611 ) This is similar to -msve-vector-bits, but for streaming mode: it constrains the legal values of "vscale", allowing optimizations based on that constraint. This also fixes conversions between SVE vectors and fixed-width vectors in streaming functions with -msve-vector-bits and -msve-streaming-vector-bits. This rejects any use of arm_sve_vector_bits types in streaming functions; if it becomes relevant, we could add arm_sve_streaming_vector_bits types in the future. This doesn't touch the __ARM_FEATURE_SVE_BITS define.	2025-07-03 13:44:38 -07:00
Steven Perron	68173c8091	[HLSL][SPRIV] Handle signed RWBuffer correctly (#144774 ) In Vulkan, the signedness of the accesses to images has to match the signedness of the backing image. See https://docs.vulkan.org/spec/latest/chapters/textures.html#textures-input, where it says the behaviour is undefined if > the signedness of any read or sample operation does not match the signedness of the image’s format. Users who define say an `RWBuffer<int>` will create a Vulkan image with a signed integer format. So the HLSL that is generated must match that expecation. The solution we use is to generate a `spirv.SignedImage` target type for signed integer instead of `spirv.Image`. The two types are otherwise the same. The backend will add the `signExtend` image operand to access to the image to ensure the image is access as a signed image. Fixes #144580	2025-07-02 12:09:47 -04:00
Sarah Spall	23be14b222	[HLSL][SPIRV] Boolean in a RawBuffer should be i32 and Boolean vector in a RawBuffer should be <N x i32> (#144929 ) Instead of converting the type in a RawBuffer to its HLSL type using 'ConvertType', use 'ConvertTypeForMem'. ConvertTypeForMem handles booleans being i32 and boolean vectors being < N x i32 >. Add tests to show booleans and boolean vectors in RawBuffers now have the correct type of i32, and respectively. Closes #141089	2025-06-27 13:43:03 -07:00
Alex Voicu	992f0d1225	[Clang][SPIRV][AMDGPU] Override `supportsLibCall` for AMDGCNSPIRV (#143814 ) The `supportsLibCall` predicate is used to select whether some math builtins get expanded in the FE or they get lowered into libcalls. The default implementation unconditionally returns true, which is problematic for AMDGCN-flavoured SPIRV, as AMDGPU does not support any libcalls at the moment. This change overrides the predicate in order to reflect this and correctly do the expected FE expansion when targeting AMDGCN-flavoured SPIRV.	2025-06-25 11:22:59 +01:00
Kazu Hirata	ae372bfca8	[CodeGen] Use range-based for loops (NFC) (#145142 )	2025-06-21 08:20:57 -07:00
Nick Sarnie	86d1d6b2c0	[clang] Use TargetInfo to determine device kernel calling convention (#144728 ) We should abstract this logic away to `TargetInfo`. See https://github.com/llvm/llvm-project/pull/137882 for more information. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-06-18 20:50:12 +00:00
David Green	030a471753	[AArch64][Clang] Exclude address spaces from pointer-only coercion types. As reported on #135064, the generic pointer coercion code in CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast the pointers). This bails out if an address space qualifier is found on the pointer.	2025-06-12 20:51:58 +01:00
David Green	5f648c370e	[AArch64] Change the coercion type of structs with pointer members. (#135064 ) The aim here is to avoid a ptrtoint->inttoptr round-trip through the function argument whilst keeping the calling convention the same. Given a struct which is <= 128bits in size, which can only contain either 1 or 2 pointers, we convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or [2 x i64]. This helps alias analysis produce more accurate results.	2025-06-10 07:04:54 +01:00
Nick Sarnie	3b9ebe9201	[clang] Simplify device kernel attributes (#137882 ) We have multiple different attributes in clang representing device kernels for specific targets/languages. Refactor them into one attribute with different spellings to make it more easily scalable for new languages/targets. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-06-05 14:15:38 +00:00
Ami-zhang	8c65f68330	[clang][LoongArch] Add support for the _Float16 type (#141703 ) Enable _Float16 for LoongArch target. Additionally, this change fixes incorrect ABI lowering of _Float16 in the case of structs containing fp16 that are eligible for passing via GPR+FPR or FPR+FPR. Finally, it also fixes int16 -> __fp16 conversion code gen, which uses generic LLVM IR rather than llvm.convert.to.fp16 intrinsics.	2025-06-03 14:26:11 +08:00
Nikita Popov	e2b536431d	[CodeGen] Move CodeGenPGO behind unique_ptr (NFC) (#142155 ) The InstrProf headers are very expensive. Avoid including them in all of CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr. This reduces clang build time by 0.8%.	2025-06-02 09:51:54 +02:00
Steven Perron	5584020d8a	[HLSL][SPIRV] Implement the SPIR-V target type for cbuffers. (#140061 ) This change implement the type used to represent cbuffer for SPIR-V. Fixes https://github.com/llvm/llvm-project/issues/138274.	2025-05-28 07:51:03 -04:00
David Green	3a42cbd47d	[AArch64] Rename AArch64SVEACLETypes.def and add base SVE_TYPE.	2025-05-28 12:26:54 +01:00
Cassandra Beckley	5a4571133a	[HLSL] Implement `SpirvType` and `SpirvOpaqueType` (#134034 ) This implements the design proposed by [Representing SpirvType in Clang's Type System](https://github.com/llvm/wg-hlsl/pull/181). It creates `HLSLInlineSpirvType` as a new `Type` subclass, and `__hlsl_spirv_type` as a new builtin type template to create such a type. This new type is lowered to the `spirv.Type` target extension type, as described in [Target Extension Types for Inline SPIR-V and Decorated Types](https://github.com/llvm/wg-hlsl/blob/main/proposals/0017-inline-spirv-and-decorated-types.md).	2025-05-27 11:40:54 -04:00
Kazu Hirata	8075c15f54	[CodeGen] Remove unused includes (NFC) (#141418 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-05-25 10:55:28 -07:00
Anatoly Trosinenko	f10a90587f	[clang][AArch64] Move initialization of ptrauth-* function attrs (#140277 ) Move the initialization of ptrauth-* function attributes near the initialization of branch protection attributes. The semantics of these groups of attributes partially overlaps, so handle both groups in getDefaultFunctionAttributes() and setTargetAttributes() functions to prevent getting them out of sync. This fixes C++ TLS wrappers.	2025-05-20 12:50:58 +03:00
choikwa	77de8a0c0a	[AMDGPU][clang] provide device implementation for __builtin_logb and … (#129347 ) …__builtin_scalbn Clang generates library calls for __builtin_* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn.	2025-05-19 14:11:31 -04:00
Sam Elliott	cfc5baf6e6	[RISCV] SiFive CLIC Support (#132481 ) This Change adds support for two SiFive vendor attributes in clang: - "SiFive-CLIC-preemptible" - "SiFive-CLIC-stack-swap" These can be given together, and can be combined with "machine", but cannot be combined with any other interrupt attribute values. These are handled primarily in RISCVFrameLowering: - "SiFive-CLIC-stack-swap" entails swapping `sp` with `sf.mscratchcsw` at function entry and exit, which holds the trap stack pointer. - "SiFive-CLIC-preemptible" entails saving `mcause` and `mepc` before re-enabling interrupts using `mstatus`. To save these, `s0` and `s1` are first spilled to the stack, and then the values are read into these registers. If these registers are used in the function, their values will be spilled a second time onto the stack with the generic callee-saved-register handling. At the end of the function interrupts are disabled again before `mepc` and `mcause` are restored. This Change also adds support for the following two experimental extensions, which only contain CSRs: - XSfsclic - for SiFive's CLIC Supervisor-Mode CSRs - XSfmclic - for SiFive's CLIC Machine-Mode CSRs The latter is needed for interrupt support. The CFI information for this implementation is not correct, but I'd prefer to correct this in a follow-up. While it's unlikely anyone wants to unwind through a handler, the CFI information is also used by debuggers so it would be good to get it right. Co-authored-by: Ana Pazos <apazos@quicinc.com>	2025-04-25 17:12:27 -07:00
Victor Campos	6738cfe0a4	Mark CXX module initializer with PACBTI attributes (#133716 ) The CXX module initializer function, which is called at program startup, needs to be tagged with Pointer Authentication and Branch Target Identification marks whenever relevant. Before this patch, in CPUs set up for PACBTI execution, the function wasn't protected with return address signing and no BTI instruction was inserted at the start of it, thus leading to an execution fault. This patch fixes the issue by marking the function with the function attributes related to PAC and BTI if relevant.	2025-04-25 11:04:34 +01:00
Benson Chu	50320504c8	[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been saved. - GPRCS1 - GPRCS2 - FPStatusRegs (new) - DPRCS - GPRCS3 - DPRCS2 FPSCR is present on all targets with a VFP, but the FPEXC register is not present on Cortex-M devices, so different amounts of bytes are being pushed onto the stack depending on our target, which would affect alignment for subsequent saves. DPRCS1 will sum up all previous bytes that were saved, and will emit extra instructions to ensure that its alignment is correct. My assumption is that if DPRCS1 is able to correct its alignment to be correct, then all subsequent saves will also have correct alignment. Avoid annotating the saving of FPSCR and FPEXC for functions marked with the interrupt_save_fp attribute, even though this is done as part of frame setup. Since these are status registers, there really is no viable way of annotating this. Since these aren't GPRs or DPRs, they can't be used with .save or .vsave directives. Instead, just record that the intermediate registers r4 and r5 are saved to the stack again. Co-authored-by: Jake Vossen <jake@vossen.dev> Co-authored-by: Alan Phipps <a-phipps@ti.com>	2025-04-22 14:31:29 -05:00
Sarah Spall	7810d84844	[HLSL] Boolean in a RawBuffer should be i32 and Boolean vector in a RawBuffer should be <N x i32> (#135848 ) Instead of converting the type in a RawBuffer to its HLSL type using 'ConvertType', use 'ConvertTypeForMem'. ConvertTypeForMem handles booleans being i32 and boolean vectors being < N x i32 >. Add tests to show booleans and boolean vectors in RawBuffers now have the correct type of i32, and <N x i32> respectively. Closes #135635	2025-04-21 15:11:39 -07:00
Kazu Hirata	f4c76bba59	[clang] Use llvm::append_range (NFC) (#136256 ) This patch replaces: llvm::copy(Src, std::back_inserter(Dst)); with: llvm::append_range(Dst, Src); for breavity. One side benefit is that llvm::append_range eventually calls llvm::SmallVector::reserve if Dst is of llvm::SmallVector.	2025-04-18 00:15:13 -07:00
Tom Honermann	0348ff5158	[SYCL] Basic code generation for SYCL kernel caller offload entry point functions. (#133030 ) A function declared with the `sycl_kernel_entry_point` attribute, sometimes called a SYCL kernel entry point function, specifies a pattern from which the parameters and body of an offload entry point function, sometimes called a SYCL kernel caller function, are derived. SYCL kernel caller functions are emitted during SYCL device compilation. Their parameters and body are derived from the `SYCLKernelCallStmt` statement and `OutlinedFunctionDecl` declaration associated with their corresponding SYCL kernel entry point function. A distinct SYCL kernel caller function is generated for each SYCL kernel entry point function defined as a non-inline function or ODR-used in the translation unit. The name of each SYCL kernel caller function is parameterized by the SYCL kernel name type specified by the `sycl_kernel_entry_point` attribute attached to the corresponding SYCL kernel entry point function. For the moment, the Itanium ABI mangled name for typeinfo data (`_ZTS<type>`) is used to name these functions; a future change will switch to a more appropriate naming scheme. The calling convention used for a SYCL kernel caller function is target dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently provided. These functions are required to observe the language restrictions for SYCL devices as specified by the SYCL 2020 specification; this includes a forward progress guarantee and prohibits recursion. Only SYCL kernel caller functions, functions declared as `SYCL_EXTERNAL`, and functions directly or indirectly referenced from those functions should be emitted during device compilation. Pruning of other declarations has not yet been implemented. --------- Co-authored-by: Elizabeth Andrews <elizabeth.andrews@intel.com>	2025-04-17 09:14:45 -04:00
Jonas Paulsson	6d03f51f0c	[SystemZ] Add support for 16-bit floating point. (#109164 ) - _Float16 is now accepted by Clang. - The half IR type is fully handled by the backend. - These values are passed in FP registers and converted to/from float around each operation. - Compiler-rt conversion functions are now built for s390x including the missing extendhfdf2 which was added. Fixes #50374	2025-04-16 20:02:56 +02:00
Shilei Tian	ce01e4e2f6	[Clang][OpenCL][AMDGPU] Use `byref` for aggregate OpenCL kernel arguments (#134892 ) Due to a previous workaround allowing kernels to be called from other functions, Clang currently doesn't use the `byref` attribute for aggregate kernel arguments. The issue was recently resolved in https://github.com/llvm/llvm-project/pull/115821. With that fix, we can now enable the use of `byref` consistently across all languages. Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Fixes SWDEV-247226. Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-04-13 10:17:55 -04:00
Aniket Lal	642481a428	[Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel (#115821 ) This feature is currently not supported in the compiler. To facilitate this we emit a stub version of each kernel function body with different name mangling scheme, and replaces the respective kernel call-sites appropriately. Fixes https://github.com/llvm/llvm-project/issues/60313 D120566 was an earlier attempt made to upstream a solution for this issue. --------- Co-authored-by: anikelal <anikelal@amd.com>	2025-04-08 10:29:30 +05:30
Farzon Lotfi	82103dfae9	Revert "Reland [Clang][Cmake] fix libtool duplicate member name warnings" (#134656 ) Reverts llvm/llvm-project#133850	2025-04-07 10:00:53 -04:00
Farzon Lotfi	0d71d9ab28	Reland [Clang][Cmake] fix libtool duplicate member name warnings (#133850 ) fixes https://github.com/llvm/llvm-project/issues/133199 As of the third commit the fix to the linker missing references in `Targets/DirectX.cpp` found in https://github.com/llvm/llvm-project/pull/133776 was fixed by moving `HLSLBufferLayoutBuilder.cpp` to `clang/lib/CodeGen/Targets/`. It fixes the circular reference issue found in https://github.com/llvm/llvm-project/pull/133619 for all `-DBUILD_SHARED_LIBS=ON` builds by removing `target_link_libraries` from the sub directory cmake files. testing for amdgpu offload was done via `cmake -B ../llvm_amdgpu -S llvm -GNinja -C offload/cmake/caches/Offload.cmake -DCMAKE_BUILD_TYPE=Release` PR https://github.com/llvm/llvm-project/pull/132252 Created a second file that shared <TargetName>.cpp in clang/lib/CodeGen/CMakeLists.txt For example There were two AMDGPU.cpp's one in TargetBuiltins and the other in Targets. Even though these were in different directories libtool warns that it might not distinguish them because they share the same base name. There are two potential fixes. The easy fix is to rename one of them and keep one cmake file. That solution though doesn't future proof this problem in the event of a third <TargetName>.cpp and it seems teams want to just use the target name https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483. The alternative fix that this PR went with is to seperate the cmake files into their own sub directories as static libs.	2025-04-07 09:53:07 -04:00
Steven Perron	16603d838c	[HLSL] Add SPIR-V target type for RWStructuredBuffers (#133468 ) This PR adds the target type for main storage for HLSL raw buffer types. It does not handle the counter variables that are associated with those buffers. This is implementing part of https://github.com/llvm/wg-hlsl/blob/main/proposals/0018-spirv-resource-representation.md. We do not handle other HLSL raw buffer types.	2025-04-01 16:59:46 -04:00
Farzon Lotfi	bdae91b08b	Revert "[Clang][Cmake] fix libtool duplicate member name warnings" (#133795 ) Reverts llvm/llvm-project#133619	2025-03-31 17:00:38 -04:00
Farzon Lotfi	cc2b432614	[Clang][Cmake] fix libtool duplicate member name warnings (#133619 ) fixes #133199 PR #132252 Created a second file that shared `<TargetName>.cpp` in `clang/lib/CodeGen/CMakeLists.txt` For example There were two `AMDGPU.cpp`'s one in `TargetBuiltins` and the other in `Targets`. Even though these were in different directories `libtool` warns that it might not distinguish them because they share the same base name. There are two potential fixes. The easy fix is to rename one of them and keep one cmake file. That solution though doesn't future proof this problem in the event of a third `<TargetName>.cpp` and it seems teams want to just use the target name https://github.com/llvm/llvm-project/pull/132252#issuecomment-2758178483. The alternative fix is to seperate the cmake files into their own sub directories. I chose to create static libraries. It might of been possible to build an OBJECT, but I only saw examples of this in compiler-rt and test directories so assumed there was a reason it wasn't used.	2025-03-31 14:21:22 -04:00
Joseph Huber	772173f548	[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870 ) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.	2025-03-28 07:35:16 -05:00
Ben Shi	597accfea6	[clang][CodeGen][AVR] Fix a crash in AVRABIInfo (#131976 ) fixes https://github.com/llvm/llvm-project/issues/131967	2025-03-22 13:22:32 +08:00
Alexander Shaposhnikov	297f0b3f4c	[CudaSPIRV] Allow using integral non-type template parameters as attribute args (#131546 ) Allow using integral non-type template parameters as attribute arguments of reqd_work_group_size and work_group_size_hint. Test plan: ninja check-all	2025-03-19 10:11:18 -07:00
Helena Kotas	cb64a363ca	[HLSL] Make sure `isSigned` flag is set on target type for `TypedBuffer` resources with signed int vectors (#130223 ) Fixes #130191	2025-03-14 13:09:21 -07:00
Helena Kotas	73e12de062	[HLSL] Implement explicit layout for default constant buffer ($Globals) (#128991 ) Processes `HLSLResourceBindingAttr` attributes that represent `register(c#)` annotations on default constant buffer declarations and applies its value to the buffer layout. Any default buffer declarations without an explicit `register(c#)` annotation are placed after the elements with explicit layout. This PR also adds a test case for a `cbuffer` that does not have `packoffset` on all declarations. Same layout rules apply here as well. Fixes #126791	2025-03-12 22:35:07 -07:00
Benson Chu	3b3356043c	Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute" This reverts commit 1f05703176d43a339b41a474f51c0e8b1a83c9bb.	2025-03-10 10:11:23 -05:00
Benson Chu	1f05703176	[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been saved. - GPRCS1 - GPRCS2 - FPStatusRegs (new) - DPRCS - GPRCS3 - DPRCS2 FPSCR is present on all targets with a VFP, but the FPEXC register is not present on Cortex-M devices, so different amounts of bytes are being pushed onto the stack depending on our target, which would affect alignment for subsequent saves. DPRCS1 will sum up all previous bytes that were saved, and will emit extra instructions to ensure that its alignment is correct. My assumption is that if DPRCS1 is able to correct its alignment to be correct, then all subsequent saves will also have correct alignment. Avoid annotating the saving of FPSCR and FPEXC for functions marked with the interrupt_save_fp attribute, even though this is done as part of frame setup. Since these are status registers, there really is no viable way of annotating this. Since these aren't GPRs or DPRs, they can't be used with .save or .vsave directives. Instead, just record that the intermediate registers r4 and r5 are saved to the stack again. Co-authored-by: Jake Vossen <jake@vossen.dev> Co-authored-by: Alan Phipps <a-phipps@ti.com>	2025-03-10 10:05:15 -05:00
Matt Arsenault	0d2c55cb96	AMDGPU: Move enqueued block handling into clang (#128519 ) The previous implementation wasn't maintaining a faithful IR representation of how this really works. The value returned by createEnqueuedBlockKernel wasn't actually used as a function, and hacked up later to be a pointer to the runtime handle global variable. In reality, the enqueued block is a struct where the first field is a pointer to the kernel descriptor, not the kernel itself. We were also relying on passing around a reference to a global using a string attribute containing its name. It's better to base this on a proper IR symbol reference during final emission. This now avoids using a function attribute on kernels and avoids using the additional "runtime-handle" attribute to populate the final metadata. Instead, associate the runtime handle reference to the kernel with the !associated global metadata. We can then get a final, correctly mangled name at the end. I couldn't figure out how to get rename-with-external-symbol behavior using a combination of comdats and aliases, so leaves an IR pass to externalize the runtime handles for codegen. If anything breaks, it's most likely this, so leave avoiding this for a later step. Use a special section name to enable this behavior. This also means it's possible to declare enqueuable kernels in source without going through the dedicated block syntax or other dedicated compiler support. We could move towards initializing the runtime handle in the compiler/linker. I have a working patch where the linker sets up the first field of the handle, avoiding the need to export the block kernel symbol for the runtime. We would need new relocations to get the private and group sizes, but that would avoid the runtime's special case handling that requires the device_enqueue_symbol metadata field. https://reviews.llvm.org/D141700	2025-03-10 19:54:04 +07:00
Kito Cheng	55f86cf023	[RISCV][clang] Fix wrong VLS CC detection (#130107 ) RISCVABIInfo::detectVLSCCEligibleStruct should early exit if VLS calling convention is not used, however the sentinel value was not set to correctly, it should be zero instead of one.	2025-03-07 11:15:20 +08:00

1 2 3 4 5

221 Commits