Enable parsing alignas attribute after GNU attributes, before
ParseDeclaration
This might be useful for cuda code where __shared__ and other
specificators may be mixed with align.
I'd be glad to see if there are any better places or other technique to
process this attribute without interrupting current flow of parsing.
When a builtin function is passed a pointer with a different
address space, clang creates an overloaded
builtin function but does not copy the host/device attribute. This
causes
error when the builtin is called by device functions
since CUDA/HIP relies on the host/device attribute to treat
a builtin function as callable on both host and device
sides.
Fixed by copying the host/device attribute of the original
builtin function to the created overloaded builtin function.
This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)
If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.
The goal is to be able to merge
https://github.com/llvm/llvm-project/pull/122423 without being too
disruptive.
This change means that the selection of the best viable candidate and
template argument deduction become interleaved.
To avoid rewriting half of Clang we store in `OverloadCandidateSet`
enough information to be able to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. Which means
the lifetime of any object used by template argument must outlive
a call to `Add*Template*Candidate`.
This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.
The change has a nice impact on compile times
https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au
Fixes https://github.com/llvm/llvm-project/issues/62096
Fixes https://github.com/llvm/llvm-project/issues/74581
Reapplies #133426
This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (sections 1-2, not 3)
If, during overload resolution, a non-template candidate is always
picked because each argument is a perfect match (i.e., the source and
target types are the same), we do not perform deduction for any template
candidate that might exist.
The goal is to be able to merge #122423 without being too disruptive.
This change means that the selection of the best viable candidate and
template argument deduction become interleaved.
To avoid rewriting half of Clang, we store in `OverloadCandidateSet`
enough information to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. This means the lifetime of
any object used by the template argument must outlive a call to
`Add*Template*Candidate`.
This two-phase resolution is not performed for some initialization as
there are cases where template candidates are a better match per the
standard. It's also bypassed for code completion.
The change has a nice impact on compile times
https://llvm-compile-time-tracker.com/compare.php?from=edc22c64e527171041876f26a491bb1d03d905d5&to=8170b860bd4b70917005796c05a9be013a95abb2&stat=instructions%3AuFixes#62096Fixes#74581Fixes#53454
Currently if CUDA/HIP users use template class with virtual dtor
and std::string data member with C++20 and MSVC. When the template
class is explicitly instantiated, there is error about host
function called by host device function (used to be undefined
symbols in linking stage before member destructors were checked
by deferred diagnostics).
It was caused by clang inferring host/device attributes for
default dtors. Since all dtors of member and parent classes
have implicit host device attrs, clang infers the virtual dtor have
implicit host and device attrs. Since virtual dtor of
explicitly instantiated template class must be emitted,
this causes constexpr dtor of std::string emitted, which
calls a host function which was not emitted on device side.
This is a serious issue since it prevents users from
using std::string with C++20 on Windows.
When inferring host device attr of virtual dtor of explicit
template class instantiation, clang should be conservative
since it is sure to be emitted. Since an implicit host device
function may call a host function, clang cannot assume it is
always available on device. This guarantees dtors that
may call host functions not to have implicit device attr,
therefore will not be emitted on device side.
Fixes: https://github.com/llvm/llvm-project/issues/108548
Fixes: SWDEV-517435
Currently the deferred diag fails to diagnose calling of host function
in host device function in device compilation triggered by destructors.
This can be further divided into two issuse:
1. the deferred diag visitor does not visit dtor of member and parent
class when visiting dtor, which it should
2. the deferred diag visitor does not visit virtual dtor of explicit
template class instantiation, which it should
Due to these issues, some constexpr functions which call host functions
are emitted on device side, which causes undefind symbols in linking
stage, as revealed by
https://github.com/llvm/llvm-project/issues/108548
By fixing these issue, clang will diag the issues early during
compilation instead of linking.
Summary:
Currently we conditionally enable NVPTX lowering depending on the
language (C/C++/OpenMP). Unfortunately this causes problems because this
option is only present if the backend was enabled, which causes this to
error if you try to make LLVM-IR.
This patch instead makes it the only accepted lowering. The reason we
had it as opt-in before is because it is not handled by CUDA. So, this
pach also introduces diagnostics to prevent *all* creation of
device-side global constructors and destructors. We already did this for
variables, now we do it for attributes as well.
This inverts the responsibility of blocking this from the backend to the
langauage like it should be given that support for this is language
dependent.
Recently I had a scenario where I had:
1. A class C with many members m_1...m_n of the same type T
2. T's default constructor was deleted
3. I accidentally omitted an explicitly constructed member in the
initializer list C() : m_1(foo), m_2(bar), ... { }
Clang told me that T's default constructor was deleted, and told me that
the call to T() was in C() (which it implicitly was), but didn't tell me
which member was being default constructed.
It was difficult to fix this problem because I had no easy way to list
all the members of type T in C and C's superclasses which would have let
me find which member was missing,
clang/test/CXX/class/class.init/p1.cpp is a simplified version of this
problem (a2 is missing from the initializer list of B)
In C++20 constexpr virtual function is allowed. In C++17 although
non-pure virtual function is not allowed to be constexpr, pure virtual
function is allowed to be constexpr and is allowed to be overriden by
non-constexpr virtual function in the derived class.
The following code compiles as C++:
```
class A
{
public:
constexpr virtual int f() = 0;
};
class B : public A
{
public:
int f() override
{
return 42;
}
};
```
However, it fails to compile as CUDA or HIP code. The reason: A::f() is
implicitly host device function whereas B::f() is a host function. Since
they have different targets, clang does not treat B::f() as an override
of A::f(). Instead, it treats B::f() as a name-hiding non-virtual
function for A::f(), and diagnoses it.
This causes any CUDA/HIP program using C++ standard header file
`<format>` from g++-13 to fail to compile since such usage patten show
up there:
```
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3564:34: error: non-virtual member function marked 'override' hides virtual member function
3564 | _M_format_arg(size_t __id) override
| ^
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/format:3538:30: note: hidden overloaded virtual function 'std::__format::_Scanner<char>::_M_format_arg' declared here
3538 | constexpr virtual void _M_format_arg(size_t __id) = 0;
| ^
```
This is a serious issue and there is no workaround.
This patch allows non-constexpr function to override constexpr virtual
function for CUDA and HIP. This should be OK since non-constexpr
function without explicit host or device attribute can only be called in
host functions.
Fixes: SWDEV-507350
LLVM support for the attribute has been implemented already, so it just
plumbs it through to the CUDA front-end.
One notable difference from NVCC is that the attribute can be used
regardless of the targeted GPU. On the older GPUs it will just be
ignored. The attribute is a performance hint, and does not warrant a
hard error if compiler can't benefit from it on a particular GPU
variant.
This command-line option is now required while building the HIP
applications (mainly for the host side) after we enabled __fp16
args and return values with patches D133885 & D145345.
- For languages following SPMD/SIMT programming model, functions and
call sites are marked 'convergent' by default. 'noconvergent' is added
in this patch to allow developers to remove that 'convergent'
attribute when it's safe.
Reviewers:
nhaehnle, Sirraide, yxsamliu, Artem-B, ilovepi, jayfoad, ssahasra, arsenm
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/100637
Crashed when the number of args passed was less than number of
parameters in builtin definition, because we were indexing the list of
args while iterating through the entire number of parameters.
To enable function multi-versioning (FMV), current checks which rely on
cmd line options or global macros to see if target feature is present
need to be removed. This patch removes those for NEON and also
implements changes to NEON header file as proposed in
[ACLE](https://github.com/ARM-software/acle/pull/321).
nvcc warns about the following code:
`void f();
__device__ void f() {}`
but clang does not since clang allows device function to overload host
function.
Users want clang to emit similar warning to help code to be compatible
with nvcc.
Since this may cause regression with existing code, the warning is off
by default and can be enabled by -Wnvcc-compat.
It won't cause warning in system headers, even with -Wnvcc-compat.
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:
- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.
Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
This tries to fix all of the places where a diagnostic message starts
with a capital letter (other than acroynyms or proper nouns) or ends
with punctuation (other than a question mark).
This is in support of a planned change to tablegen to start diagnosing
incorrect diagnostic message styles.
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
Summary:
The PTX target supports the f16 type natively and we alreaqdy have a few
LLVM backend tests that support the LLVM-IR. We should be able to enable
this for generic use. This is done prior the f16 math functions being
written in the GPU libc case.
For
`__atomic_compare_exchange{,_n}/__c11_atomic_compare_exchange_{strong,weak}`,
GCC checks both the success memory order and the failure memory order
under the default -Winvalid-memory-model ("memory model" is confusing
here and "memory order" is much more common in the atomic context).
* The failure memory order, if a constant, must be one of
relaxed/consume/acquire/seq_cst.
Clang checks just the success memory order under the default
-Watomic-memory-ordering. This patch checks the failure memory order.
This patch is needed for the reduction instructions in sve2.1
It add a new header to sve with all the fixed vector types.
The new types are only added if neon is not declared.
Treat ctor/dtor in device var init as host device function
so that they can be used to initialize file-scope
device variables to match nvcc behavior. If they are non-trivial
they will be diagnosed.
We cannot add implicit host device attrs to non-trivial
ctor/dtor since determining whether they are non-trivial
needs to know whether they have a trivial body and all their
member and base classes' ctor/dtor have trivial body, which
is affected by where their bodies are defined or instantiated.
Fixes: #72261
Fixes: SWDEV-432412
This reverts commit 27e6e4a4d0e3296cebad8db577ec0469a286795e.
This patch is reverted due to regression. A testcase is:
`template <class T>
struct ptr {
~ptr() { static int x = 1;}
};
template <class T>
struct Abc : ptr<T> {
public:
Abc();
~Abc() {}
};
template
class Abc<int>;
`
This reverts commit a1e2c6566305061c115954b048f2957c8d55cb5b.
Revert this patch due to regression. A testcase is:
`template <typename T>
class C {
explicit C() {};
};
template <> C<int>::C() {};
`
When deciding whether a previous function declaration is an overload or
override, implicit host/device attrs should not be considered.
This fixes the failure for the following code:
`template <typename T>
class C {
explicit C() {};
};
template <> C<int>::C() {};
`
The issue was introduced by
https://github.com/llvm/llvm-project/pull/72394
sine the template specialization is treated as overload due to implicit
host/device attrs are considered for overload/override differentiation.
Make trivial ctor/dtor implicitly host device functions so that they can
be used to initialize file-scope
device variables to match nvcc behavior.
Fixes: https://github.com/llvm/llvm-project/issues/72261
Fixes: SWDEV-432412
Added option -foffload-implicit-host-device-templates which is off by
default.
When the option is on, template functions and specializations without
host/device attributes have implicit host device attributes.
They can be overridden by device template functions with the same
signagure.
They are emitted on device side only if they are used on device side.
This feature is added as an extension.
`__has_extension(cuda_implicit_host_device_templates)` can be used to
check whether it is enabled.
This is to facilitate using standard C++ headers for device.
Fixes: https://github.com/llvm/llvm-project/issues/69956
Fixes: SWDEV-428314
Reapplication of 7339c0f782d5c70e0928f8991b0c05338a90c84c with a fix
for a crash involving arrays without a size expression.
Clang supports VLAs in C++ as an extension, but we currently only warn
on their use when you pass -Wvla, -Wvla-extension, or -pedantic.
However, VLAs as they're expressed in C have been considered by WG21
and rejected, are easy to use accidentally to the surprise of users
(e.g., https://ddanilov.me/default-non-standard-features/), and they
have potential security implications beyond constant-size arrays
(https://wiki.sei.cmu.edu/confluence/display/c/ARR32-C.+Ensure+size+arguments+for+variable+length+arrays+are+in+a+valid+range).
C++ users should strongly consider using other functionality such as
std::vector instead.
This seems like sufficiently compelling evidence to warn users about
VLA use by default in C++ modes. This patch enables the -Wvla-extension
diagnostic group in C++ language modes by default, and adds the warning
group to -Wall in GNU++ language modes. The warning is still opt-in in
C language modes, where support for VLAs is somewhat less surprising to
users.
RFC: https://discourse.llvm.org/t/rfc-diagnosing-use-of-vlas-in-c/73109
Fixes https://github.com/llvm/llvm-project/issues/62836
Differential Revision: https://reviews.llvm.org/D156565
Currently clang diagnoses the following code:
(https://godbolt.org/z/s8zK3E5P5) but nvcc
does not.
`
struct A {
constexpr A(){}
};
struct B {
A a;
int b;
};
template<typename T>
__global__ void kernel( )
{
__shared__ B x;
}
`
Clang generates an implicit trivial ctor for struct B, which should be
allowed for initializing a shared variable.
However, the body of the ctor is defined only if the template kernel is
instantiated. Clang checks the initialization of variable in
non-instantiated templates, where it cannot find the body of the ctor,
therefore diagnoses it.
This patch skips the check for non-instantiated templates.
https://reviews.llvm.org/D158247 caused regressions for HIP on Windows
and was reverted.
A reduced test case is:
```
typedef void (__stdcall* funcTy)();
void invoke(funcTy f);
static void __stdcall callee() noexcept {
}
void foo() {
invoke(callee);
}
```
It is due to clang missing handling host/device attributes for calling
convention at a few places
This patch fixes that.
This reverts commit de0df639724b10001ea9a74539381ea494296be9.
It was reverted due to regression in HIP unit test on Windows:
In file included from C:\hip-tests\catch\unit\graph\hipGraphClone.cc:37:
In file included from C:\hip-tests\catch\.\include\hip_test_common.hh:24:
In file included from C:\hip-tests\catch\.\include/hip_test_context.hh:24:
In file included from C:/install/native/Release/x64/hip/include\hip/hip_runtime.h:54:
C:/dk/win\vc\14.31.31107\include\thread:76:70: error: cannot initialize a parameter of type '_beginthreadex_proc_type' (aka 'unsigned int (*)(void *) __attribute__((stdcall))') with an lvalue of type 'const unsigned int (*)(void *) noexcept __attribute__((stdcall))': different exception specifications
76 | reinterpret_cast<void*>(_CSTD _beginthreadex(nullptr, 0, _Invoker_proc, _Decay_copied.get(), 0, &_Thr._Id));
| ^~~~~~~~~~~~~
C:\hip-tests\catch\unit\graph\hipGraphClone.cc:290:21) &>' requested here
90 | _Start(_STD forward<_Fn>(_Fx), _STD forward<_Args>(_Ax)...);
| ^
C:\hip-tests\catch\unit\graph\hipGraphClone.cc:290:21) &, 0>' requested here
311 | std::thread t(lambdaFunc);
| ^
C:/dk/win\ms_wdk\e22621\Include\10.0.22621.0\ucrt\process.h:99:40: note: passing argument to parameter '_StartAddress' here
99 | _In_ _beginthreadex_proc_type _StartAddress,
| ^
1 error generated when compiling for gfx1030.
Currently, clang does not resolve certain overloaded functions correctly in the initializer
of global variables, e.g.
template<typename T1, typename U>
T1 mypow(T1, U);
__attribute__((device)) double mypow(double, int);
double t_extent = mypow(1.0, 2);
In the above example, mypow is supposed to resolve to the host version
but clang resolves it to the device version instead, and emits an error
(https://godbolt.org/z/17xxzaa67).
However, if the variable is assigned in a host function, there is no error.
The discrepancy in overloading resolution inside and outside of
a function is due to clang not accounting for the host/device target
when resolving functions called in the initializer of a global variable.
This patch introduces a global host/device target context for CUDA/HIP
for functions called outside of functions. For global variable initialization,
it is determined by the host/device attribute of the variable. For other
situations, a default value of host_device is sufficient.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D158247
Fixes: SWDEV-416731
nvcc allows using std::malloc and std::free in device code.
When std::malloc or std::free is passed as a template
function argument with template argument deduction,
there is no diagnostics. e.g.
__global__ void kern() {
void *p = std::malloc(1);
std::free(p);
}
int main()
{
std::shared_ptr<float> a;
a = std::shared_ptr<float>(
(float*)std::malloc(sizeof(float) * 100),
std::free
);
return 0;
}
However, the same code fails to compile with clang
(https://godbolt.org/z/1roGvo6YY). The reason is
that clang does not have logic to choose a function
argument from an overloaded set of candidates
based on host/device attributes for template argument
deduction.
Currently, clang does have a logic to choose a candidate
based on the constraints of the candidates. This patch
extends that logic to account for the CUDA host/device-based
preference.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D154300
Currently clang does not consider host/device preference
when resolving delete operator in the file scope, which
causes device operator delete selected for class member
initialization.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D156795