llvm-project

Author	SHA1	Message	Date
Oleksandr T.	2e8e254d18	[Clang] include attribute scope in diagnostics (#144619 ) This patch updates diagnostics to print fully qualified attribute names, including scope when present.	2025-07-08 11:36:52 +03:00
Artem Belevich	59ef2c34a1	[CUDA] Disallow use of address_space(N) on CUDA device variables. (#142857 ) The variables have implicit host-side shadow instances and explicit address space attribute breaks them on the host.	2025-06-09 10:39:04 -07:00
Denis.G	b3a6d434a7	[Clang] Allow parsing arbitrary order of attributes for declarations (#133107 ) Enable parsing alignas attribute after GNU attributes, before ParseDeclaration This might be useful for cuda code where __shared__ and other specificators may be mixed with align. I'd be glad to see if there are any better places or other technique to process this attribute without interrupting current flow of parsing.	2025-05-09 14:10:18 -04:00
Yaxun (Sam) Liu	c16297cd3f	[CUDA][HIP] Fix host/device attribute of builtin (#138162 ) When a builtin function is passed a pointer with a different address space, clang creates an overloaded builtin function but does not copy the host/device attribute. This causes error when the builtin is called by device functions since CUDA/HIP relies on the host/device attribute to treat a builtin function as callable on both host and device sides. Fixed by copying the host/device attribute of the original builtin function to the created overloaded builtin function.	2025-05-07 22:03:33 -04:00
cor3ntin	8c5a307bd8	[Clang] Bypass TAD during overload resolution if a perfect match exists (#136203 ) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (section 1-2, not 3) If during overload resolution, there is a non-template candidate that would be always be picked - because each of the argument is a perfect match (ie the source and target types are the same), we do not perform deduction for any template candidate that might exists. The goal is to be able to merge https://github.com/llvm/llvm-project/pull/122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang we store in OverloadCandidateSet enough information to be able to deduce template candidates from OverloadCandidateSet::BestViableFunction. Which means the lifetime of any object used by template argument must outlive a call to AddTemplateCandidate. This two phase resolution is not performed for some initialization as there are cases where template candidate are better match in these cases per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au Fixes https://github.com/llvm/llvm-project/issues/62096 Fixes https://github.com/llvm/llvm-project/issues/74581 Reapplies https://github.com/llvm/llvm-project/pull/133426	2025-04-18 06:10:58 +02:00
cor3ntin	2a91d04b02	Revert "[Clang] Bypass TAD during overload resolution if a perfect match exists" (#136113 ) Reverts llvm/llvm-project#136018 Still some bots failing https://lab.llvm.org/buildbot/#/builders/52/builds/7643	2025-04-17 11:00:56 +02:00
cor3ntin	377ec36b32	[Clang] Bypass TAD during overload resolution if a perfect match exists (#136018 ) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (section 1-2, not 3) If during overload resolution, there is a non-template candidate that would be always be picked - because each of the argument is a perfect match (ie the source and target types are the same), we do not perform deduction for any template candidate that might exists. The goal is to be able to merge https://github.com/llvm/llvm-project/pull/122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang we store in `OverloadCandidateSet` enough information to be able to deduce template candidates from `OverloadCandidateSet::BestViableFunction`. Which means the lifetime of any object used by template argument must outlive a call to `AddTemplateCandidate`. This two phase resolution is not performed for some initialization as there are cases where template candidate are better match in these cases per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au Fixes https://github.com/llvm/llvm-project/issues/62096 Fixes https://github.com/llvm/llvm-project/issues/74581 Reapplies #133426	2025-04-17 08:09:55 +02:00
cor3ntin	6ccc9280ba	Revert "[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists" (#135993 ) Reverts llvm/llvm-project#133426 This is failing on some bots https://lab.llvm.org/buildbot/#/builders/163/builds/17265	2025-04-16 19:40:28 +02:00
cor3ntin	facc57fc25	[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists (#133426 ) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (sections 1-2, not 3) If, during overload resolution, a non-template candidate is always picked because each argument is a perfect match (i.e., the source and target types are the same), we do not perform deduction for any template candidate that might exist. The goal is to be able to merge #122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang, we store in `OverloadCandidateSet` enough information to deduce template candidates from `OverloadCandidateSet::BestViableFunction`. This means the lifetime of any object used by the template argument must outlive a call to `AddTemplateCandidate`. This two-phase resolution is not performed for some initialization as there are cases where template candidates are a better match per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=edc22c64e527171041876f26a491bb1d03d905d5&to=8170b860bd4b70917005796c05a9be013a95abb2&stat=instructions%3Au Fixes #62096 Fixes #74581 Fixes #53454	2025-04-16 19:09:45 +02:00
Alexander Shaposhnikov	297f0b3f4c	[CudaSPIRV] Allow using integral non-type template parameters as attribute args (#131546 ) Allow using integral non-type template parameters as attribute arguments of reqd_work_group_size and work_group_size_hint. Test plan: ninja check-all	2025-03-19 10:11:18 -07:00
Yaxun (Sam) Liu	d37a39207b	[CUDA][HIP] fix virtual dtor host/device attr (#128926 ) Currently if CUDA/HIP users use template class with virtual dtor and std::string data member with C++20 and MSVC. When the template class is explicitly instantiated, there is error about host function called by host device function (used to be undefined symbols in linking stage before member destructors were checked by deferred diagnostics). It was caused by clang inferring host/device attributes for default dtors. Since all dtors of member and parent classes have implicit host device attrs, clang infers the virtual dtor have implicit host and device attrs. Since virtual dtor of explicitly instantiated template class must be emitted, this causes constexpr dtor of std::string emitted, which calls a host function which was not emitted on device side. This is a serious issue since it prevents users from using std::string with C++20 on Windows. When inferring host device attr of virtual dtor of explicit template class instantiation, clang should be conservative since it is sure to be emitted. Since an implicit host device function may call a host function, clang cannot assume it is always available on device. This guarantees dtors that may call host functions not to have implicit device attr, therefore will not be emitted on device side. Fixes: https://github.com/llvm/llvm-project/issues/108548 Fixes: SWDEV-517435	2025-03-03 10:23:35 -05:00
Yaxun (Sam) Liu	0f0665db06	[CUDA][HIP] check dtor in deferred diag (#129117 ) Currently the deferred diag fails to diagnose calling of host function in host device function in device compilation triggered by destructors. This can be further divided into two issuse: 1. the deferred diag visitor does not visit dtor of member and parent class when visiting dtor, which it should 2. the deferred diag visitor does not visit virtual dtor of explicit template class instantiation, which it should Due to these issues, some constexpr functions which call host functions are emitted on device side, which causes undefind symbols in linking stage, as revealed by https://github.com/llvm/llvm-project/issues/108548 By fixing these issue, clang will diag the issues early during compilation instead of linking.	2025-02-28 09:58:19 -05:00
Joseph Huber	3d9409f5bc	[NVPTX] Make ctor/dtor lowering always enabled in NVPTX (#126544 ) Summary: Currently we conditionally enable NVPTX lowering depending on the language (C/C++/OpenMP). Unfortunately this causes problems because this option is only present if the backend was enabled, which causes this to error if you try to make LLVM-IR. This patch instead makes it the only accepted lowering. The reason we had it as opt-in before is because it is not handled by CUDA. So, this pach also introduces diagnostics to prevent all creation of device-side global constructors and destructors. We already did this for variables, now we do it for attributes as well. This inverts the responsibility of blocking this from the backend to the langauage like it should be given that support for this is language dependent.	2025-02-10 14:25:44 -06:00
Brian Foley	39879e4f40	[Sema] Note member decl when initializer list default constructs member (#121854 ) Recently I had a scenario where I had: 1. A class C with many members m_1...m_n of the same type T 2. T's default constructor was deleted 3. I accidentally omitted an explicitly constructed member in the initializer list C() : m_1(foo), m_2(bar), ... { } Clang told me that T's default constructor was deleted, and told me that the call to T() was in C() (which it implicitly was), but didn't tell me which member was being default constructed. It was difficult to fix this problem because I had no easy way to list all the members of type T in C and C's superclasses which would have let me find which member was missing, clang/test/CXX/class/class.init/p1.cpp is a simplified version of this problem (a2 is missing from the initializer list of B)	2025-02-03 19:57:37 +01:00
Yaxun (Sam) Liu	1c99907222	[CUDA][HIP] Fix overriding of constexpr virtual function (#121986 ) In C++20 constexpr virtual function is allowed. In C++17 although non-pure virtual function is not allowed to be constexpr, pure virtual function is allowed to be constexpr and is allowed to be overriden by non-constexpr virtual function in the derived class. The following code compiles as C++: ``` class A { public: constexpr virtual int f() = 0; }; class B : public A { public: int f() override { return 42; } }; ``` However, it fails to compile as CUDA or HIP code. The reason: A::f() is implicitly host device function whereas B::f() is a host function. Since they have different targets, clang does not treat B::f() as an override of A::f(). Instead, it treats B::f() as a name-hiding non-virtual function for A::f(), and diagnoses it. This causes any CUDA/HIP program using C++ standard header file `<format>` from g++-13 to fail to compile since such usage patten show up there: ``` /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3564:34: error: non-virtual member function marked 'override' hides virtual member function 3564 \| _M_format_arg(size_t __id) override \| ^ /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3538:30: note: hidden overloaded virtual function 'std::__format::_Scanner<char>::_M_format_arg' declared here 3538 \| constexpr virtual void _M_format_arg(size_t __id) = 0; \| ^ ``` This is a serious issue and there is no workaround. This patch allows non-constexpr function to override constexpr virtual function for CUDA and HIP. This should be OK since non-constexpr function without explicit host or device attribute can only be called in host functions. Fixes: SWDEV-507350	2025-01-09 14:05:27 -05:00
Alexander Shaposhnikov	df13acf344	[CudaSPIRV] Add support for optional spir-v attributes (#116589 ) Add support for optional spir-v attributes. Test plan: ninja check-all	2024-11-19 13:14:45 -08:00
Artem Belevich	7c3fdcc276	[CUDA] Add support for __grid_constant__ attribute (#114589 ) LLVM support for the attribute has been implemented already, so it just plumbs it through to the CUDA front-end. One notable difference from NVCC is that the attribute can be used regardless of the targeted GPU. On the older GPUs it will just be ignored. The attribute is a performance hint, and does not warrant a hard error if compiler can't benefit from it on a particular GPU variant.	2024-11-05 10:48:54 -08:00
Christudasan Devadasan	91fd1b4f32	[HIP] Always add -fnative-half-arguments-and-returns cmdline option. (#113335 ) This command-line option is now required while building the HIP applications (mainly for the host side) after we enabled __fp16 args and return values with patches D133885 & D145345.	2024-10-22 22:07:35 +05:30
darkbuck	fa84297002	[clang][CUDA] Add 'noconvergent' function and statement attribute - For languages following SPMD/SIMT programming model, functions and call sites are marked 'convergent' by default. 'noconvergent' is added in this patch to allow developers to remove that 'convergent' attribute when it's safe. Reviewers: nhaehnle, Sirraide, yxsamliu, Artem-B, ilovepi, jayfoad, ssahasra, arsenm Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/100637	2024-07-31 11:30:48 -04:00
c8ef	3f222f3bc6	[NFC] Fix some typos (#98791 )	2024-07-14 13:28:11 +02:00
Raymond Tian	8477ca6e8e	[HIP][Clang][Sema] Fix crash when calling builtins with pointer arguments (#95957 ) Crashed when the number of args passed was less than number of parameters in builtin definition, because we were indexing the list of args while iterating through the entire number of parameters.	2024-07-01 14:54:04 -04:00
Lukacma	8a46bbbc22	[Clang] Remove preprocessor guards and global feature checks for NEON (#95224 ) To enable function multi-versioning (FMV), current checks which rely on cmd line options or global macros to see if target feature is present need to be removed. This patch removes those for NEON and also implements changes to NEON header file as proposed in [ACLE](https://github.com/ARM-software/acle/pull/321).	2024-06-25 17:19:42 +02:00
Yaxun (Sam) Liu	53d2f4d967	[CUDA][HIP] warn incompatible redeclare (#77359 ) nvcc warns about the following code: `void f(); __device__ void f() {}` but clang does not since clang allows device function to overload host function. Users want clang to emit similar warning to help code to be compatible with nvcc. Since this may cause regression with existing code, the warning is off by default and can be enabled by -Wnvcc-compat. It won't cause warning in system headers, even with -Wnvcc-compat.	2024-06-10 10:08:26 -04:00
Alex Voicu	88e2bb4092	[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796 ) This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features: - AMDGCN inline ASM is allowed/supported, under the assumption that the [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc) extension is enabled/used - AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is enabled when using the downstream translator - the featureset matches the union of AMDGCN targets' features - the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc) extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong. Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).	2024-06-07 11:50:23 +01:00
Aaron Ballman	b49ce9c304	Fix more diagnostic wording for style; NFC (#93190 ) This tries to fix all of the places where a diagnostic message starts with a capital letter (other than acroynyms or proper nouns) or ends with punctuation (other than a question mark). This is in support of a planned change to tablegen to start diagnosing incorrect diagnostic message styles.	2024-05-23 14:50:29 -04:00
Fangrui Song	7e59223ac4	[test] %clang_cc1: remove redundant actions ParseFrontendArgs takes the last OPT_Action_Group option. The other actions are overridden.	2024-05-05 10:46:06 -07:00
Fangrui Song	7c1d9b15ee	[test] %clang_cc1: remove redundant actions	2024-05-04 23:08:11 -07:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Pranav Kant	318bff6811	[clang][CUDA] Disable float128 diagnostics for device compilation (#83918 )	2024-03-06 16:40:23 -08:00
Joseph Huber	53e96984b6	[NVPTX] Enable the _Float16 type for NVPTX compilation (#82436 ) Summary: The PTX target supports the f16 type natively and we alreaqdy have a few LLVM backend tests that support the LLVM-IR. We should be able to enable this for generic use. This is done prior the f16 math functions being written in the GPU libc case.	2024-02-20 18:12:27 -06:00
Fangrui Song	fed564432c	[Sema] atomic_compare_exchange: check failure memory order (#74959 ) For `__atomic_compare_exchange{,_n}/__c11_atomic_compare_exchange_{strong,weak}`, GCC checks both the success memory order and the failure memory order under the default -Winvalid-memory-model ("memory model" is confusing here and "memory order" is much more common in the atomic context). * The failure memory order, if a constant, must be one of relaxed/consume/acquire/seq_cst. Clang checks just the success memory order under the default -Watomic-memory-ordering. This patch checks the failure memory order.	2023-12-14 11:03:28 -08:00
CarolineConcatto	ed2d497291	[Clang][AArch64] Add fix vector types to header into SVE (#73258 ) This patch is needed for the reduction instructions in sve2.1 It add a new header to sve with all the fixed vector types. The new types are only added if neon is not declared.	2023-12-13 08:59:41 +00:00
Yaxun (Sam) Liu	2b76e20ea7	[CUDA][HIP] allow trivial ctor/dtor in device var init (#73140 ) Treat ctor/dtor in device var init as host device function so that they can be used to initialize file-scope device variables to match nvcc behavior. If they are non-trivial they will be diagnosed. We cannot add implicit host device attrs to non-trivial ctor/dtor since determining whether they are non-trivial needs to know whether they have a trivial body and all their member and base classes' ctor/dtor have trivial body, which is affected by where their bodies are defined or instantiated. Fixes: #72261 Fixes: SWDEV-432412	2023-12-01 16:24:01 -05:00
Yaxun (Sam) Liu	6b3470b4b8	Revert "[CUDA][HIP] make trivial ctor/dtor host device (#72394 )" This reverts commit 27e6e4a4d0e3296cebad8db577ec0469a286795e. This patch is reverted due to regression. A testcase is: `template <class T> struct ptr { ~ptr() { static int x = 1;} }; template <class T> struct Abc : ptr<T> { public: Abc(); ~Abc() {} }; template class Abc<int>; `	2023-11-22 21:20:53 -05:00
Yaxun (Sam) Liu	22078bd9f6	Revert "[CUDA][HIP] ignore implicit host/device attr for override (#72815 )" This reverts commit a1e2c6566305061c115954b048f2957c8d55cb5b. Revert this patch due to regression. A testcase is: `template <typename T> class C { explicit C() {}; }; template <> C<int>::C() {}; `	2023-11-22 21:04:55 -05:00
Yaxun (Sam) Liu	a1e2c65663	[CUDA][HIP] ignore implicit host/device attr for override (#72815 ) When deciding whether a previous function declaration is an overload or override, implicit host/device attrs should not be considered. This fixes the failure for the following code: `template <typename T> class C { explicit C() {}; }; template <> C<int>::C() {}; ` The issue was introduced by https://github.com/llvm/llvm-project/pull/72394 sine the template specialization is treated as overload due to implicit host/device attrs are considered for overload/override differentiation.	2023-11-20 16:06:48 -05:00
Yaxun (Sam) Liu	27e6e4a4d0	[CUDA][HIP] make trivial ctor/dtor host device (#72394 ) Make trivial ctor/dtor implicitly host device functions so that they can be used to initialize file-scope device variables to match nvcc behavior. Fixes: https://github.com/llvm/llvm-project/issues/72261 Fixes: SWDEV-432412	2023-11-16 08:42:54 -05:00
Yaxun (Sam) Liu	9774d0ce5f	[CUDA][HIP] Make template implicitly host device (#70369 ) Added option -foffload-implicit-host-device-templates which is off by default. When the option is on, template functions and specializations without host/device attributes have implicit host device attributes. They can be overridden by device template functions with the same signagure. They are emitted on device side only if they are used on device side. This feature is added as an extension. `__has_extension(cuda_implicit_host_device_templates)` can be used to check whether it is enabled. This is to facilitate using standard C++ headers for device. Fixes: https://github.com/llvm/llvm-project/issues/69956 Fixes: SWDEV-428314	2023-11-09 20:36:38 -05:00
Aaron Ballman	84a3aadf0f	Diagnose use of VLAs in C++ by default Reapplication of 7339c0f782d5c70e0928f8991b0c05338a90c84c with a fix for a crash involving arrays without a size expression. Clang supports VLAs in C++ as an extension, but we currently only warn on their use when you pass -Wvla, -Wvla-extension, or -pedantic. However, VLAs as they're expressed in C have been considered by WG21 and rejected, are easy to use accidentally to the surprise of users (e.g., https://ddanilov.me/default-non-standard-features/), and they have potential security implications beyond constant-size arrays (https://wiki.sei.cmu.edu/confluence/display/c/ARR32-C.+Ensure+size+arguments+for+variable+length+arrays+are+in+a+valid+range). C++ users should strongly consider using other functionality such as std::vector instead. This seems like sufficiently compelling evidence to warn users about VLA use by default in C++ modes. This patch enables the -Wvla-extension diagnostic group in C++ language modes by default, and adds the warning group to -Wall in GNU++ language modes. The warning is still opt-in in C language modes, where support for VLAs is somewhat less surprising to users. RFC: https://discourse.llvm.org/t/rfc-diagnosing-use-of-vlas-in-c/73109 Fixes https://github.com/llvm/llvm-project/issues/62836 Differential Revision: https://reviews.llvm.org/D156565	2023-10-20 13:10:03 -04:00
Aaron Ballman	f5043f46c0	Revert "Diagnose use of VLAs in C++ by default" This reverts commit 7339c0f782d5c70e0928f8991b0c05338a90c84c. Breaks bots: https://lab.llvm.org/buildbot/#/builders/139/builds/51875 https://lab.llvm.org/buildbot/#/builders/164/builds/45262	2023-10-20 10:00:18 -04:00
Aaron Ballman	7339c0f782	Diagnose use of VLAs in C++ by default Clang supports VLAs in C++ as an extension, but we currently only warn on their use when you pass -Wvla, -Wvla-extension, or -pedantic. However, VLAs as they're expressed in C have been considered by WG21 and rejected, are easy to use accidentally to the surprise of users (e.g., https://ddanilov.me/default-non-standard-features/), and they have potential security implications beyond constant-size arrays (https://wiki.sei.cmu.edu/confluence/display/c/ARR32-C.+Ensure+size+arguments+for+variable+length+arrays+are+in+a+valid+range). C++ users should strongly consider using other functionality such as std::vector instead. This seems like sufficiently compelling evidence to warn users about VLA use by default in C++ modes. This patch enables the -Wvla-extension diagnostic group in C++ language modes by default, and adds the warning group to -Wall in GNU++ language modes. The warning is still opt-in in C language modes, where support for VLAs is somewhat less surprising to users. RFC: https://discourse.llvm.org/t/rfc-diagnosing-use-of-vlas-in-c/73109 Fixes https://github.com/llvm/llvm-project/issues/62836 Differential Revision: https://reviews.llvm.org/D156565	2023-10-20 09:50:21 -04:00
Yaxun (Sam) Liu	fc53b1abf7	[CUDA][HIP] Fix init var diag in temmplate (#69081 ) Currently clang diagnoses the following code: (https://godbolt.org/z/s8zK3E5P5) but nvcc does not. ` struct A { constexpr A(){} }; struct B { A a; int b; }; template<typename T> __global__ void kernel( ) { __shared__ B x; } ` Clang generates an implicit trivial ctor for struct B, which should be allowed for initializing a shared variable. However, the body of the ctor is defined only if the template kernel is instantiated. Clang checks the initialization of variable in non-instantiated templates, where it cannot find the body of the ctor, therefore diagnoses it. This patch skips the check for non-instantiated templates.	2023-10-17 10:00:32 -04:00
Jakub Chlanda	3f8d4a8ef2	Reland [NVPTX] Add support for maxclusterrank in launch_bounds (#66496 ) (#67667 ) This reverts commit 0afbcb20fd908f8bf9073697423da097be7db592.	2023-09-29 08:39:31 +02:00
Sam McCall	0afbcb20fd	Revert "[NVPTX] Add support for maxclusterrank in launch_bounds (#66496 )" This reverts commit dfab31b41b4988b6dc8129840eba68f0c36c0f13. SemaDeclAttr.cpp cannot depend on Basic's private headers (lib/Basic/Targets/NVPTX.h)	2023-09-27 10:59:04 +02:00
Jakub Chlanda	dfab31b41b	[NVPTX] Add support for maxclusterrank in launch_bounds (#66496 ) Since SM_90 CUDA supports specifying additional argument to the launch_bounds attribute: maxBlocksPerCluster, to express the maximum number of CTAs that can be part of the cluster. See: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-dimension-directives-maxclusterrank and https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds for details.	2023-09-27 08:51:26 +02:00
Yaxun (Sam) Liu	9b7763821a	Reland "[CUDA][HIP] Fix overloading resolution in global var init" (#65606 ) https://reviews.llvm.org/D158247 caused regressions for HIP on Windows and was reverted. A reduced test case is: ``` typedef void (__stdcall* funcTy)(); void invoke(funcTy f); static void __stdcall callee() noexcept { } void foo() { invoke(callee); } ``` It is due to clang missing handling host/device attributes for calling convention at a few places This patch fixes that.	2023-09-07 23:18:30 -04:00
Yaxun (Sam) Liu	27313b68ef	Revert "[CUDA][HIP] Fix overloading resolution in global variable initializer" This reverts commit de0df639724b10001ea9a74539381ea494296be9. It was reverted due to regression in HIP unit test on Windows: In file included from C:\hip-tests\catch\unit\graph\hipGraphClone.cc:37: In file included from C:\hip-tests\catch\.\include\hip_test_common.hh:24: In file included from C:\hip-tests\catch\.\include/hip_test_context.hh:24: In file included from C:/install/native/Release/x64/hip/include\hip/hip_runtime.h:54: C:/dk/win\vc\14.31.31107\include\thread:76:70: error: cannot initialize a parameter of type '_beginthreadex_proc_type' (aka 'unsigned int ()(void ) __attribute__((stdcall))') with an lvalue of type 'const unsigned int ()(void ) noexcept __attribute__((stdcall))': different exception specifications 76 \| reinterpret_cast<void*>(_CSTD _beginthreadex(nullptr, 0, _Invoker_proc, _Decay_copied.get(), 0, &_Thr._Id)); \| ^~~~~~~~~~~~~ C:\hip-tests\catch\unit\graph\hipGraphClone.cc:290:21) &>' requested here 90 \| _Start(_STD forward<_Fn>(_Fx), _STD forward<_Args>(_Ax)...); \| ^ C:\hip-tests\catch\unit\graph\hipGraphClone.cc:290:21) &, 0>' requested here 311 \| std::thread t(lambdaFunc); \| ^ C:/dk/win\ms_wdk\e22621\Include\10.0.22621.0\ucrt\process.h:99:40: note: passing argument to parameter '_StartAddress' here 99 \| _In_ _beginthreadex_proc_type _StartAddress, \| ^ 1 error generated when compiling for gfx1030.	2023-08-31 09:02:49 -04:00
Yaxun (Sam) Liu	de0df63972	[CUDA][HIP] Fix overloading resolution in global variable initializer Currently, clang does not resolve certain overloaded functions correctly in the initializer of global variables, e.g. template<typename T1, typename U> T1 mypow(T1, U); __attribute__((device)) double mypow(double, int); double t_extent = mypow(1.0, 2); In the above example, mypow is supposed to resolve to the host version but clang resolves it to the device version instead, and emits an error (https://godbolt.org/z/17xxzaa67). However, if the variable is assigned in a host function, there is no error. The discrepancy in overloading resolution inside and outside of a function is due to clang not accounting for the host/device target when resolving functions called in the initializer of a global variable. This patch introduces a global host/device target context for CUDA/HIP for functions called outside of functions. For global variable initialization, it is determined by the host/device attribute of the variable. For other situations, a default value of host_device is sufficient. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D158247 Fixes: SWDEV-416731	2023-08-29 10:17:24 -04:00
Yaxun (Sam) Liu	ea72a4e654	[CUDA][HIP] Fix template argument deduction nvcc allows using std::malloc and std::free in device code. When std::malloc or std::free is passed as a template function argument with template argument deduction, there is no diagnostics. e.g. __global__ void kern() { void p = std::malloc(1); std::free(p); } int main() { std::shared_ptr<float> a; a = std::shared_ptr<float>( (float)std::malloc(sizeof(float) * 100), std::free ); return 0; } However, the same code fails to compile with clang (https://godbolt.org/z/1roGvo6YY). The reason is that clang does not have logic to choose a function argument from an overloaded set of candidates based on host/device attributes for template argument deduction. Currently, clang does have a logic to choose a candidate based on the constraints of the candidates. This patch extends that logic to account for the CUDA host/device-based preference. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D154300	2023-08-08 17:39:01 -04:00
Yaxun (Sam) Liu	247cc265e7	[CUDA][HIP] Fix overloading resolution of delete operator Currently clang does not consider host/device preference when resolving delete operator in the file scope, which causes device operator delete selected for class member initialization. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D156795	2023-08-08 09:50:24 -04:00

1 2 3 4 5 ...

272 Commits