272 Commits

Author SHA1 Message Date
Oleksandr T.
2e8e254d18
[Clang] include attribute scope in diagnostics (#144619)
This patch updates diagnostics to print fully qualified attribute names,
including scope when present.
2025-07-08 11:36:52 +03:00
Artem Belevich
59ef2c34a1
[CUDA] Disallow use of address_space(N) on CUDA device variables. (#142857)
The variables have implicit host-side shadow instances and explicit
address space attribute breaks them on the host.
2025-06-09 10:39:04 -07:00
Denis.G
b3a6d434a7
[Clang] Allow parsing arbitrary order of attributes for declarations (#133107)
Enable parsing alignas attribute after GNU attributes, before
ParseDeclaration

This might be useful for cuda code where __shared__ and other
specificators may be mixed with align.

I'd be glad to see if there are any better places or other technique to
process this attribute without interrupting current flow of parsing.
2025-05-09 14:10:18 -04:00
Yaxun (Sam) Liu
c16297cd3f
[CUDA][HIP] Fix host/device attribute of builtin (#138162)
When a builtin function is passed a pointer with a different
address space, clang creates an overloaded
builtin function but does not copy the host/device attribute. This
causes
error when the builtin is called by device functions
since CUDA/HIP relies on the host/device attribute to treat
a builtin function as callable on both host and device
sides.

Fixed by copying the host/device attribute of the original
builtin function to the created overloaded builtin function.
2025-05-07 22:03:33 -04:00
cor3ntin
8c5a307bd8
[Clang] Bypass TAD during overload resolution if a perfect match exists (#136203)
This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge
https://github.com/llvm/llvm-project/pull/122423 without being too
disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in OverloadCandidateSet
enough information to be able to deduce template candidates from
OverloadCandidateSet::BestViableFunction. Which means
the lifetime of any object used by template argument must outlive
a call to Add*Template*Candidate.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes https://github.com/llvm/llvm-project/issues/62096
Fixes https://github.com/llvm/llvm-project/issues/74581

Reapplies https://github.com/llvm/llvm-project/pull/133426
2025-04-18 06:10:58 +02:00
cor3ntin
2a91d04b02
Revert "[Clang] Bypass TAD during overload resolution if a perfect match exists" (#136113)
Reverts llvm/llvm-project#136018

Still some bots failing
https://lab.llvm.org/buildbot/#/builders/52/builds/7643
2025-04-17 11:00:56 +02:00
cor3ntin
377ec36b32
[Clang] Bypass TAD during overload resolution if a perfect match exists (#136018)
This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge
https://github.com/llvm/llvm-project/pull/122423 without being too
disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in `OverloadCandidateSet`
enough information to be able to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. Which means
the lifetime of any object used by template argument must outlive
a call to `Add*Template*Candidate`.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes https://github.com/llvm/llvm-project/issues/62096
Fixes https://github.com/llvm/llvm-project/issues/74581

Reapplies #133426
2025-04-17 08:09:55 +02:00
cor3ntin
6ccc9280ba
Revert "[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists" (#135993)
Reverts llvm/llvm-project#133426

This is failing on some bots
https://lab.llvm.org/buildbot/#/builders/163/builds/17265
2025-04-16 19:40:28 +02:00
cor3ntin
facc57fc25
[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists (#133426)
This implements the same overload resolution behavior as GCC, 
as described in https://wg21.link/p3606 (sections 1-2, not 3)

If, during overload resolution, a non-template candidate is always
picked because each argument is a perfect match (i.e., the source and
target types are the same), we do not perform deduction for any template
candidate that might exist.

The goal is to be able to merge #122423 without being too disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang, we store in `OverloadCandidateSet`
enough information to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. This means the lifetime of
any object used by the template argument must outlive a call to
`Add*Template*Candidate`.

This two-phase resolution is not performed for some initialization as
there are cases where template candidates are a better match per the
standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=edc22c64e527171041876f26a491bb1d03d905d5&to=8170b860bd4b70917005796c05a9be013a95abb2&stat=instructions%3Au

Fixes #62096
Fixes #74581
Fixes #53454
2025-04-16 19:09:45 +02:00
Alexander Shaposhnikov
297f0b3f4c
[CudaSPIRV] Allow using integral non-type template parameters as attribute args (#131546)
Allow using integral non-type template parameters as attribute arguments
of
reqd_work_group_size and work_group_size_hint.

Test plan:
ninja check-all
2025-03-19 10:11:18 -07:00
Yaxun (Sam) Liu
d37a39207b
[CUDA][HIP] fix virtual dtor host/device attr (#128926)
Currently if CUDA/HIP users use template class with virtual dtor
and std::string data member with C++20 and MSVC. When the template
class is explicitly instantiated, there is error about host
function called by host device function (used to be undefined
symbols in linking stage before member destructors were checked
by deferred diagnostics).

It was caused by clang inferring host/device attributes for
default dtors. Since all dtors of member and parent classes
have implicit host device attrs, clang infers the virtual dtor have
implicit host and device attrs. Since virtual dtor of
explicitly instantiated template class must be emitted,
this causes constexpr dtor of std::string emitted, which
calls a host function which was not emitted on device side.

This is a serious issue since it prevents users from
using std::string with C++20 on Windows.

When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative
since it is sure to be emitted. Since an implicit host device
function may call a host function, clang cannot assume it is
always available on device. This guarantees dtors that
may call host functions not to have implicit device attr,
therefore will not be emitted on device side.

Fixes: https://github.com/llvm/llvm-project/issues/108548

Fixes: SWDEV-517435
2025-03-03 10:23:35 -05:00
Yaxun (Sam) Liu
0f0665db06
[CUDA][HIP] check dtor in deferred diag (#129117)
Currently the deferred diag fails to diagnose calling of host function
in host device function in device compilation triggered by destructors.

This can be further divided into two issuse:

1. the deferred diag visitor does not visit dtor of member and parent
class when visiting dtor, which it should

2. the deferred diag visitor does not visit virtual dtor of explicit
template class instantiation, which it should

Due to these issues, some constexpr functions which call host functions
are emitted on device side, which causes undefind symbols in linking
stage, as revealed by
https://github.com/llvm/llvm-project/issues/108548

By fixing these issue, clang will diag the issues early during
compilation instead of linking.
2025-02-28 09:58:19 -05:00
Joseph Huber
3d9409f5bc
[NVPTX] Make ctor/dtor lowering always enabled in NVPTX (#126544)
Summary:
Currently we conditionally enable NVPTX lowering depending on the
language (C/C++/OpenMP). Unfortunately this causes problems because this
option is only present if the backend was enabled, which causes this to
error if you try to make LLVM-IR.

This patch instead makes it the only accepted lowering. The reason we
had it as opt-in before is because it is not handled by CUDA. So, this
pach also introduces diagnostics to prevent *all* creation of
device-side global constructors and destructors. We already did this for
variables, now we do it for attributes as well.

This inverts the responsibility of blocking this from the backend to the
langauage like it should be given that support for this is language
dependent.
2025-02-10 14:25:44 -06:00
Brian Foley
39879e4f40
[Sema] Note member decl when initializer list default constructs member (#121854)
Recently I had a scenario where I had:
1. A class C with many members m_1...m_n of the same type T
2. T's default constructor was deleted
3. I accidentally omitted an explicitly constructed member in the
initializer list C() : m_1(foo), m_2(bar), ... { }

Clang told me that T's default constructor was deleted, and told me that
the call to T() was in C() (which it implicitly was), but didn't tell me
which member was being default constructed.

It was difficult to fix this problem because I had no easy way to list
all the members of type T in C and C's superclasses which would have let
me find which member was missing,

clang/test/CXX/class/class.init/p1.cpp is a simplified version of this
problem (a2 is missing from the initializer list of B)
2025-02-03 19:57:37 +01:00
Yaxun (Sam) Liu
1c99907222
[CUDA][HIP] Fix overriding of constexpr virtual function (#121986)
In C++20 constexpr virtual function is allowed. In C++17 although
non-pure virtual function is not allowed to be constexpr, pure virtual
function is allowed to be constexpr and is allowed to be overriden by
non-constexpr virtual function in the derived class.

The following code compiles as C++:

```
class A
{
public:
    constexpr virtual int f() = 0;
};

class B : public A
{
public:
    int f() override
    {
        return 42;
    }
};
```

However, it fails to compile as CUDA or HIP code. The reason: A::f() is
implicitly host device function whereas B::f() is a host function. Since
they have different targets, clang does not treat B::f() as an override
of A::f(). Instead, it treats B::f() as a name-hiding non-virtual
function for A::f(), and diagnoses it.

This causes any CUDA/HIP program using C++ standard header file
`<format>` from g++-13 to fail to compile since such usage patten show
up there:

```
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3564:34: error: non-virtual member function marked 'override' hides virtual member function
 3564 |       _M_format_arg(size_t __id) override
      |                                  ^
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3538:30: note: hidden overloaded virtual function 'std::__format::_Scanner<char>::_M_format_arg' declared here
 3538 |       constexpr virtual void _M_format_arg(size_t __id) = 0;
      |                              ^
```

This is a serious issue and there is no workaround.

This patch allows non-constexpr function to override constexpr virtual
function for CUDA and HIP. This should be OK since non-constexpr
function without explicit host or device attribute can only be called in
host functions.

Fixes: SWDEV-507350
2025-01-09 14:05:27 -05:00
Alexander Shaposhnikov
df13acf344
[CudaSPIRV] Add support for optional spir-v attributes (#116589)
Add support for optional spir-v attributes.

Test plan:
ninja check-all
2024-11-19 13:14:45 -08:00
Artem Belevich
7c3fdcc276
[CUDA] Add support for __grid_constant__ attribute (#114589)
LLVM support for the attribute has been implemented already, so it just
plumbs it through to the CUDA front-end.

One notable difference from NVCC is that the attribute can be used
regardless of the targeted GPU. On the older GPUs it will just be
ignored. The attribute is a performance hint, and does not warrant a
hard error if compiler can't benefit from it on a particular GPU
variant.
2024-11-05 10:48:54 -08:00
Christudasan Devadasan
91fd1b4f32
[HIP] Always add -fnative-half-arguments-and-returns cmdline option. (#113335)
This command-line option is now required while building the HIP
applications (mainly for the host side) after we enabled __fp16
args and return values with patches D133885 & D145345.
2024-10-22 22:07:35 +05:30
darkbuck
fa84297002
[clang][CUDA] Add 'noconvergent' function and statement attribute
- For languages following SPMD/SIMT programming model, functions and
  call sites are marked 'convergent' by default. 'noconvergent' is added
  in this patch to allow developers to remove that 'convergent'
  attribute when it's safe.

Reviewers:
nhaehnle, Sirraide, yxsamliu, Artem-B, ilovepi, jayfoad, ssahasra, arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/100637
2024-07-31 11:30:48 -04:00
c8ef
3f222f3bc6
[NFC] Fix some typos (#98791) 2024-07-14 13:28:11 +02:00
Raymond Tian
8477ca6e8e
[HIP][Clang][Sema] Fix crash when calling builtins with pointer arguments (#95957)
Crashed when the number of args passed was less than number of
parameters in builtin definition, because we were indexing the list of
args while iterating through the entire number of parameters.
2024-07-01 14:54:04 -04:00
Lukacma
8a46bbbc22
[Clang] Remove preprocessor guards and global feature checks for NEON (#95224)
To enable function multi-versioning (FMV), current checks which rely on
cmd line options or global macros to see if target feature is present
need to be removed. This patch removes those for NEON and also
implements changes to NEON header file as proposed in
[ACLE](https://github.com/ARM-software/acle/pull/321).
2024-06-25 17:19:42 +02:00
Yaxun (Sam) Liu
53d2f4d967
[CUDA][HIP] warn incompatible redeclare (#77359)
nvcc warns about the following code:

`void f();
__device__ void f() {}`

but clang does not since clang allows device function to overload host
function.

Users want clang to emit similar warning to help code to be compatible
with nvcc.

Since this may cause regression with existing code, the warning is off
by default and can be enabled by -Wnvcc-compat.

It won't cause warning in system headers, even with -Wnvcc-compat.
2024-06-10 10:08:26 -04:00
Alex Voicu
88e2bb4092
[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796)
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:

- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.

Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
2024-06-07 11:50:23 +01:00
Aaron Ballman
b49ce9c304
Fix more diagnostic wording for style; NFC (#93190)
This tries to fix all of the places where a diagnostic message starts
with a capital letter (other than acroynyms or proper nouns) or ends
with punctuation (other than a question mark).

This is in support of a planned change to tablegen to start diagnosing
incorrect diagnostic message styles.
2024-05-23 14:50:29 -04:00
Fangrui Song
7e59223ac4 [test] %clang_cc1: remove redundant actions
ParseFrontendArgs takes the last OPT_Action_Group option. The other
actions are overridden.
2024-05-05 10:46:06 -07:00
Fangrui Song
7c1d9b15ee [test] %clang_cc1: remove redundant actions 2024-05-04 23:08:11 -07:00
Jun Wang
c4e517f59c
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-03-12 10:30:39 -07:00
Pranav Kant
318bff6811
[clang][CUDA] Disable float128 diagnostics for device compilation (#83918) 2024-03-06 16:40:23 -08:00
Joseph Huber
53e96984b6
[NVPTX] Enable the _Float16 type for NVPTX compilation (#82436)
Summary:
The PTX target supports the f16 type natively and we alreaqdy have a few
LLVM backend tests that support the LLVM-IR. We should be able to enable
this for generic use. This is done prior the f16 math functions being
written in the GPU libc case.
2024-02-20 18:12:27 -06:00
Fangrui Song
fed564432c
[Sema] atomic_compare_exchange: check failure memory order (#74959)
For

`__atomic_compare_exchange{,_n}/__c11_atomic_compare_exchange_{strong,weak}`,
GCC checks both the success memory order and the failure memory order
under the default -Winvalid-memory-model ("memory model" is confusing
here and "memory order" is much more common in the atomic context).

* The failure memory order, if a constant, must be one of
  relaxed/consume/acquire/seq_cst.

Clang checks just the success memory order under the default
-Watomic-memory-ordering. This patch checks the failure memory order.
2023-12-14 11:03:28 -08:00
CarolineConcatto
ed2d497291
[Clang][AArch64] Add fix vector types to header into SVE (#73258)
This patch is needed for the reduction instructions in sve2.1
 It add a new header to sve with all the fixed vector types.
  The new types are only added if neon is not declared.
2023-12-13 08:59:41 +00:00
Yaxun (Sam) Liu
2b76e20ea7
[CUDA][HIP] allow trivial ctor/dtor in device var init (#73140)
Treat ctor/dtor in device var init as host device function
so that they can be used to initialize file-scope
device variables to match nvcc behavior. If they are non-trivial
they will be diagnosed.

We cannot add implicit host device attrs to non-trivial
ctor/dtor since determining whether they are non-trivial
needs to know whether they have a trivial body and all their
member and base classes' ctor/dtor have trivial body, which
is affected by where their bodies are defined or instantiated.

Fixes: #72261

Fixes: SWDEV-432412
2023-12-01 16:24:01 -05:00
Yaxun (Sam) Liu
6b3470b4b8 Revert "[CUDA][HIP] make trivial ctor/dtor host device (#72394)"
This reverts commit 27e6e4a4d0e3296cebad8db577ec0469a286795e.

This patch is reverted due to regression. A testcase is:

`template <class T>
struct ptr {
    ~ptr() { static int x = 1;}
};

template <class T>
struct Abc : ptr<T> {
 public:
  Abc();
  ~Abc() {}
};

template
class Abc<int>;
`
2023-11-22 21:20:53 -05:00
Yaxun (Sam) Liu
22078bd9f6 Revert "[CUDA][HIP] ignore implicit host/device attr for override (#72815)"
This reverts commit a1e2c6566305061c115954b048f2957c8d55cb5b.

Revert this patch due to regression. A testcase is:

`template <typename T>
class C {
    explicit C() {};
};

template <> C<int>::C() {};
`
2023-11-22 21:04:55 -05:00
Yaxun (Sam) Liu
a1e2c65663
[CUDA][HIP] ignore implicit host/device attr for override (#72815)
When deciding whether a previous function declaration is an overload or
override, implicit host/device attrs should not be considered.

This fixes the failure for the following code:

`template <typename T>
class C {
    explicit C() {};
};

template <> C<int>::C() {};
`

The issue was introduced by
https://github.com/llvm/llvm-project/pull/72394

sine the template specialization is treated as overload due to implicit
host/device attrs are considered for overload/override differentiation.
2023-11-20 16:06:48 -05:00
Yaxun (Sam) Liu
27e6e4a4d0
[CUDA][HIP] make trivial ctor/dtor host device (#72394)
Make trivial ctor/dtor implicitly host device functions so that they can
be used to initialize file-scope
device variables to match nvcc behavior.

Fixes: https://github.com/llvm/llvm-project/issues/72261

Fixes: SWDEV-432412
2023-11-16 08:42:54 -05:00
Yaxun (Sam) Liu
9774d0ce5f
[CUDA][HIP] Make template implicitly host device (#70369)
Added option -foffload-implicit-host-device-templates which is off by
default.

When the option is on, template functions and specializations without
host/device attributes have implicit host device attributes.

They can be overridden by device template functions with the same
signagure.
They are emitted on device side only if they are used on device side.

This feature is added as an extension.
`__has_extension(cuda_implicit_host_device_templates)` can be used to
check whether it is enabled.

This is to facilitate using standard C++ headers for device.

Fixes: https://github.com/llvm/llvm-project/issues/69956

Fixes: SWDEV-428314
2023-11-09 20:36:38 -05:00
Aaron Ballman
84a3aadf0f Diagnose use of VLAs in C++ by default
Reapplication of 7339c0f782d5c70e0928f8991b0c05338a90c84c with a fix
for a crash involving arrays without a size expression.

Clang supports VLAs in C++ as an extension, but we currently only warn
on their use when you pass -Wvla, -Wvla-extension, or -pedantic.
However, VLAs as they're expressed in C have been considered by WG21
and rejected, are easy to use accidentally to the surprise of users
(e.g., https://ddanilov.me/default-non-standard-features/), and they
have potential security implications beyond constant-size arrays
(https://wiki.sei.cmu.edu/confluence/display/c/ARR32-C.+Ensure+size+arguments+for+variable+length+arrays+are+in+a+valid+range).
C++ users should strongly consider using other functionality such as
std::vector instead.

This seems like sufficiently compelling evidence to warn users about
VLA use by default in C++ modes. This patch enables the -Wvla-extension
diagnostic group in C++ language modes by default, and adds the warning
group to -Wall in GNU++ language modes. The warning is still opt-in in
C language modes, where support for VLAs is somewhat less surprising to
users.

RFC: https://discourse.llvm.org/t/rfc-diagnosing-use-of-vlas-in-c/73109
Fixes https://github.com/llvm/llvm-project/issues/62836
Differential Revision: https://reviews.llvm.org/D156565
2023-10-20 13:10:03 -04:00
Aaron Ballman
f5043f46c0 Revert "Diagnose use of VLAs in C++ by default"
This reverts commit 7339c0f782d5c70e0928f8991b0c05338a90c84c.

Breaks bots:
https://lab.llvm.org/buildbot/#/builders/139/builds/51875
https://lab.llvm.org/buildbot/#/builders/164/builds/45262
2023-10-20 10:00:18 -04:00
Aaron Ballman
7339c0f782 Diagnose use of VLAs in C++ by default
Clang supports VLAs in C++ as an extension, but we currently only warn
on their use when you pass -Wvla, -Wvla-extension, or -pedantic.
However, VLAs as they're expressed in C have been considered by WG21
and rejected, are easy to use accidentally to the surprise of users
(e.g., https://ddanilov.me/default-non-standard-features/), and they
have potential security implications beyond constant-size arrays
(https://wiki.sei.cmu.edu/confluence/display/c/ARR32-C.+Ensure+size+arguments+for+variable+length+arrays+are+in+a+valid+range).
C++ users should strongly consider using other functionality such as
std::vector instead.

This seems like sufficiently compelling evidence to warn users about
VLA use by default in C++ modes. This patch enables the -Wvla-extension
diagnostic group in C++ language modes by default, and adds the warning
group to -Wall in GNU++ language modes. The warning is still opt-in in
C language modes, where support for VLAs is somewhat less surprising to
users.

RFC: https://discourse.llvm.org/t/rfc-diagnosing-use-of-vlas-in-c/73109
Fixes https://github.com/llvm/llvm-project/issues/62836
Differential Revision: https://reviews.llvm.org/D156565
2023-10-20 09:50:21 -04:00
Yaxun (Sam) Liu
fc53b1abf7
[CUDA][HIP] Fix init var diag in temmplate (#69081)
Currently clang diagnoses the following code:
(https://godbolt.org/z/s8zK3E5P5) but nvcc
does not.

`
struct A {
   constexpr A(){}
};

struct  B {
  A a;
  int b;
};

template<typename T>
__global__ void kernel( )
{
   __shared__ B x;
}
`

Clang generates an implicit trivial ctor for struct B, which should be
allowed for initializing a shared variable.

However, the body of the ctor is defined only if the template kernel is
instantiated. Clang checks the initialization of variable in
non-instantiated templates, where it cannot find the body of the ctor,
therefore diagnoses it.

This patch skips the check for non-instantiated templates.
2023-10-17 10:00:32 -04:00
Jakub Chlanda
3f8d4a8ef2
Reland [NVPTX] Add support for maxclusterrank in launch_bounds (#66496) (#67667)
This reverts commit 0afbcb20fd908f8bf9073697423da097be7db592.
2023-09-29 08:39:31 +02:00
Sam McCall
0afbcb20fd Revert "[NVPTX] Add support for maxclusterrank in launch_bounds (#66496)"
This reverts commit dfab31b41b4988b6dc8129840eba68f0c36c0f13.

SemaDeclAttr.cpp cannot depend on Basic's private headers
(lib/Basic/Targets/NVPTX.h)
2023-09-27 10:59:04 +02:00
Jakub Chlanda
dfab31b41b
[NVPTX] Add support for maxclusterrank in launch_bounds (#66496)
Since SM_90 CUDA supports specifying additional argument to the
launch_bounds attribute: maxBlocksPerCluster, to express the maximum
number of CTAs that can be part of the cluster. See:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-dimension-directives-maxclusterrank
and

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds
for details.
2023-09-27 08:51:26 +02:00
Yaxun (Sam) Liu
9b7763821a
Reland "[CUDA][HIP] Fix overloading resolution in global var init" (#65606)
https://reviews.llvm.org/D158247 caused regressions for HIP on Windows
and was reverted.

A reduced test case is:

```
typedef void (__stdcall* funcTy)();
void invoke(funcTy f);

static void __stdcall callee() noexcept {
}

void foo() {
   invoke(callee);
}
```

It is due to clang missing handling host/device attributes for calling
convention at a few places

This patch fixes that.
2023-09-07 23:18:30 -04:00
Yaxun (Sam) Liu
27313b68ef Revert "[CUDA][HIP] Fix overloading resolution in global variable initializer"
This reverts commit de0df639724b10001ea9a74539381ea494296be9.

It was reverted due to regression in HIP unit test on Windows:

 In file included from C:\hip-tests\catch\unit\graph\hipGraphClone.cc:37:

 In file included from C:\hip-tests\catch\.\include\hip_test_common.hh:24:

 In file included from C:\hip-tests\catch\.\include/hip_test_context.hh:24:

 In file included from C:/install/native/Release/x64/hip/include\hip/hip_runtime.h:54:

 C:/dk/win\vc\14.31.31107\include\thread:76:70: error: cannot initialize a parameter of type '_beginthreadex_proc_type' (aka 'unsigned int (*)(void *) __attribute__((stdcall))') with an lvalue of type 'const unsigned int (*)(void *) noexcept __attribute__((stdcall))': different exception specifications

    76 |             reinterpret_cast<void*>(_CSTD _beginthreadex(nullptr, 0, _Invoker_proc, _Decay_copied.get(), 0, &_Thr._Id));

       |                                                                      ^~~~~~~~~~~~~

 C:\hip-tests\catch\unit\graph\hipGraphClone.cc:290:21) &>' requested here

    90 |         _Start(_STD forward<_Fn>(_Fx), _STD forward<_Args>(_Ax)...);

       |         ^

 C:\hip-tests\catch\unit\graph\hipGraphClone.cc:290:21) &, 0>' requested here

   311 |     std::thread t(lambdaFunc);

       |                 ^

 C:/dk/win\ms_wdk\e22621\Include\10.0.22621.0\ucrt\process.h:99:40: note: passing argument to parameter '_StartAddress' here

    99 |     _In_      _beginthreadex_proc_type _StartAddress,

       |                                        ^

 1 error generated when compiling for gfx1030.
2023-08-31 09:02:49 -04:00
Yaxun (Sam) Liu
de0df63972 [CUDA][HIP] Fix overloading resolution in global variable initializer
Currently, clang does not resolve certain overloaded functions correctly in the initializer
of global variables, e.g.

template<typename T1, typename U>
T1 mypow(T1, U);

__attribute__((device)) double mypow(double, int);

double t_extent = mypow(1.0, 2);

In the above example, mypow is supposed to resolve to the host version
but clang resolves it to the device version instead, and emits an error
(https://godbolt.org/z/17xxzaa67).

However, if the variable is assigned in a host function, there is no error.
The discrepancy in overloading resolution inside and outside of
a function is due to clang not accounting for the host/device target
when resolving functions called in the initializer of a global variable.

This patch introduces a global host/device target context for CUDA/HIP
for functions called outside of functions. For global variable initialization,
it is determined by the host/device attribute of the variable. For other
situations, a default value of host_device is sufficient.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D158247

Fixes: SWDEV-416731
2023-08-29 10:17:24 -04:00
Yaxun (Sam) Liu
ea72a4e654 [CUDA][HIP] Fix template argument deduction
nvcc allows using std::malloc and std::free in device code.
When std::malloc or std::free is passed as a template
function argument with template argument deduction,
there is no diagnostics. e.g.

__global__ void kern() {
    void *p = std::malloc(1);
    std::free(p);
}
int main()
{

    std::shared_ptr<float> a;
    a = std::shared_ptr<float>(
      (float*)std::malloc(sizeof(float) * 100),
      std::free
    );
    return 0;
}
However, the same code fails to compile with clang
(https://godbolt.org/z/1roGvo6YY). The reason is
that clang does not have logic to choose a function
argument from an overloaded set of candidates
based on host/device attributes for template argument
deduction.

Currently, clang does have a logic to choose a candidate
based on the constraints of the candidates. This patch
extends that logic to account for the CUDA host/device-based
preference.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D154300
2023-08-08 17:39:01 -04:00
Yaxun (Sam) Liu
247cc265e7 [CUDA][HIP] Fix overloading resolution of delete operator
Currently clang does not consider host/device preference
when resolving delete operator in the file scope, which
causes device operator delete selected for class member
initialization.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D156795
2023-08-08 09:50:24 -04:00