For shifts, the LHS and RHS element types might be different. The
variable naming here could probably use some love now, but I'm trying to
fix this as fast as possible.
See the discussion in https://github.com/llvm/llvm-project/pull/108949
If we have a placement-new destination already, use that instead of
allocating a new one.
Tests are partially based on
`test/SemaCXX/cxx2c-constexpr-placement-new.cpp`.
This implements the logic of the `common_type` base template as a
builtin alias. If there should be no `type` member, an empty class is
returned. Otherwise a specialization of a `type_identity`-like class is
returned. The base template (i.e. `std::common_type`) as well as the
empty class and `type_identity`-like struct are given as arguments to
the builtin.
We were calling checkLiteralType() too many time and rejecting some
things we shouldn't. Add The calls manually when handling
MaterializeTemporaryExprs. Maybe we should call it in other places as
well, but adding more calls is easier than removing them from a generic
code path.
Currently, clang rejects the following explicit specialization of `f`
due to the constraints not being equivalent:
```
template<typename T>
struct A
{
template<bool B>
void f() requires B;
};
template<>
template<bool B>
void A<int>::f() requires B { }
```
This happens because, in most cases, we do not set the flag indicating
whether a `RedeclarableTemplate` is an explicit specialization of a
member of an implicitly instantiated class template specialization until
_after_ we compare constraints for equivalence. This patch addresses the
issue (and a number of other issues) by:
- storing the flag indicating whether a declaration is a member
specialization on a per declaration basis, and
- significantly refactoring `Sema::getTemplateInstantiationArgs` so we
collect the right set of template argument in all cases.
Many of our declaration matching & constraint evaluation woes can be
traced back to bugs in `Sema::getTemplateInstantiationArgs`. This
change/refactor should fix a lot of them. It also paves the way for
fixing #101330 and #105462 per my suggestion in #102267 (which I have
implemented on top of this patch but will merge in a subsequent PR).
The previous implementation had a bug where, if it was called on a Decl
later in the redecl chain than `LastCheckedDecl`, it could incorrectly
skip and overlook a Decl with a comment.
The patch addresses this by only using `LastCheckedDecl` if the input
Decl `D` is on the path from the first (canonical) Decl to
`LastCheckedDecl`.
An alternative that was considered was to start the iteration from the
(canonical) Decl, however this ran into problems with the modelling of
explicit template specializations in the AST where the canonical Decl
can be unusual. With the current solution, if no Decls were checked yet,
we prefer to check the input Decl over the canonical one.
Fixes https://github.com/llvm/llvm-project/issues/108145
Tweak encodeTypeForFunctionPointerAuth to handle all AMDGPU builtin
types generically instead of just __amdgpu_buffer_rsrc_t which happens
to be the only one defined so far.
This PR introduces new HLSL resource type attribute
`[[hlsl::raw_buffer]]`. Presence of this attribute on a resource handle
means that the resource does not require typed element access. The
attribute will be used on resource handles that represent buffers like
`StructuredBuffer` or `ByteAddressBuffer` and in DXIL it will be
translated to target extension type `dx.RawBuffer`.
Fixes#107907
Introducing a new HLSL resource type attribute `[[contained_type(T)]]`
which describes the "contained type" of a buffer or resource type.
Specifically, the attribute will be used on the resource handle in
templated buffer types like so:
template <typename T> struct RWBuffer {
__hlsl_resource_t [[hlsl::contained_type(T)]] [[hlsl::resource_class(UAV)]] h;
};
Fixes#104855
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Strip unneeded calls to raw_string_ostream::str(), to avoid extra indirection.
This patch enable the function multiversion(FMV) and `target_clones`
attribute for RISC-V target.
The proposal of `target_clones` syntax can be found at the
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/48 (which has
landed), as modified by the proposed
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85 (which adds the
priority syntax).
It supports the `target_clones` function attribute and function
multiversioning feature for RISC-V target. It will generate the ifunc
resolver function for the function that declared with target_clones
attribute.
The resolver function will check the version support by runtime object
`__riscv_feature_bits`.
For example:
```
__attribute__((target_clones("default", "arch=+ver1", "arch=+ver2"))) int bar() {
return 1;
}
```
the corresponding resolver will be like:
```
bar.resolver() {
__init_riscv_feature_bits();
// Check arch=+ver1
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION1) == BITMASK_OF_VERSION1) {
return bar.arch=+ver1;
} else {
// Check arch=+ver2
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION2) == BITMASK_OF_VERSION2) {
return bar.arch=+ver2;
} else {
// Default
return bar.default;
}
}
}
```
HLSL allows implicit conversions to truncate vectors to scalar
pr-values. These conversions are scored as vector truncations and should
warn appropriately.
This change allows forming a truncation cast to a pr-value, but not an
l-value. Truncating a vector to a scalar is performed by loading the
first element of the vector and disregarding the remaining elements.
Fixes#102964
Some switch statements require all SVE builtin types to be manually
specified. This patch refactors the SVE_*_TYPE macros so that such code
can be generated during preprocessing.
I've tried to establish a minimal interface that covers all types where
no special information is required and then created a set of macros that
are dedicated to specific datatypes (i.e. int, float).
This patch is groundwork to simplify the changing of SVE tuple types to
become struct based as well as work to support the FP8 ACLE.
We should issue a warning whenever a duplicate resource type attribute
is found. Currently we do that only for `resource_class`. This PR fixes
that by checking for duplicate `is_rov` attributes as well.
Also removes unnecessary parenthesis on `is_rov`.
This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.
Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.
`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.
The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.
The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.
When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.
From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.
After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.
The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.
The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
This extends default argument deduction to cover class templates as
well, applying only to partial ordering, adding to the provisional
wording introduced in https://github.com/llvm/llvm-project/pull/89807.
This solves some ambuguity introduced in P0522 regarding how template
template parameters are partially ordered, and should reduce the
negative impact of enabling `-frelaxed-template-template-args` by
default.
Given the following example:
```C++
template <class T1, class T2 = float> struct A;
template <class T3> struct B;
template <template <class T4> class TT1, class T5> struct B<TT1<T5>>; // #1
template <class T6, class T7> struct B<A<T6, T7>>; // #2
template struct B<A<int>>;
```
Prior to P0522, `#2` was picked. Afterwards, this became ambiguous. This
patch restores the pre-P0522 behavior, `#2` is picked again.