The `sycl_kernel_entry_point` attribute facilitates the generation of an
offload kernel entry point function based on the parameters and body
of the attributed function. This change extends the behavior of that
attribute to support integration with a SYCL runtime library through
an interface that communicates symbol names and kernel arguments
for the generated offload kernel entry point functions.
Consider the following function declared with the
`sycl_kernel_entry_point` attribute with a call to this function
occurring in the implementation of a SYCL kernel invocation function
such as `sycl::handler::single_task()`.
```c++
template<typename KernelName, typename KernelType>
[[clang::sycl_kernel_entry_point(KernelName)]]
void kernel_entry_point(KernelType kernel) {
kernel();
}
```
The body of the above function specifies the parameters and body of the
generated offload kernel entry point. Clearly, a call to the above
function by a SYCL kernel invocation function is not intended to execute
the body as written. Previously, code generation emitted an empty
function body so that calls to the function had no effect other than to
trigger the generation of the offload kernel entry point. The function
body is therefore available to hook for SYCL library support and is now
substituted with a call to a (SYCL library provided) function template
or variable template named `sycl_kernel_launch()` with the kernel
name type passed as the first template argument, the symbol name
of the offload kernel entry point passed as a string literal for the first
function argument, and the function parameters passed as the
remaining explicit function arguments. Given a call like this:
```c++
kernel_entry_point<struct KN>([]{})
```
the body of the instantiated `kernel_entry_point()` specialization would
be substituted as follows with "kernel-symbol-name" substituted for the
generated symbol name and `kernel` forwarded.
```c++
sycl_kernel_launch<KN>("kernel-symbol-name", kernel)
```
Name lookup and overload resolution for the `sycl_kernel_launch()`
function is performed at the point of definition of the
`sycl_kernel_entry_point` attributed function (or the point of
instantiation for an instantiated function template specialization). If
overload resolution fails, the program is ill-formed.
Implementation of the `sycl_kernel_launch()` function might require
additional information provided by the SYCL library. This is facilitated
by removing the previous prohibition against use of the
`sycl_kernel_entry_point` attribute with a non-static member function.
If the `sycl_kernel_entry_point` attributed function is a non-static
member function, then overload resolution for the `sycl_kernel_launch()`
function template may select a non-static member function in which case,
`this` will be implicitly passed as the implicit object argument.
If a `sycl_kernel_entry_point` attributed function is a non-static
member function, use of `this` in a potentially evaluated expression is
prohibited in the definition since `this` is not a kernel argument and
will not be available within the generated offload kernel entry point
function. The attribute cannot be applied to a function with an
explicit object parameter.
---------
Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
This change landed without approval.
This reverts commit 45e666a8531c1148bdb170b9a120f99e1500c427.
This reverts commit a636dd4c37f12594275de2fe180ca35bc04d76ea.
Clang isn't able to support multiple constraints on inputs and outputs,
like "rm". Instead, it picks the "safest" one to use, i.e. the memory
constraint for "rm". This leads to obviously horrible code:
asm __volatile__ ("pushf\n\t"
"popq %0"
: "=rm" (x));
is compiled to:
pushf
popq -8(%rsp)
movq -8(%rsp), %rax
It gets worse when inlined into other functions, because it may
introduce
a stack where none is needed.
With this change, Clang now generates IR for the more optimistic choice
("r"). All but the fast register allocator are able to fold registers if
it turns out that register pressure is too high.
This leaves the fast register allocator. The fast register allocator, as
the name suggests, is built for execution speed, not code quality. Thus,
we add special processing to convert the "optimistic" IR into the
"conservative" choice (again at the IR level), which we know it can
handle.
We focus on "rm" for the initial commit, but that can be expanded in the
future for other constraints where Clang generates ++ungood code (like
"g").
Fixes: https://github.com/llvm/llvm-project/issues/20571
Fix an issue where `FunctionSamples::getCanonicalFnName` incorrectly
canonicalizes omp helper functions to collide with the original function
itself. This causes the sample loader to annotate the wrong functions.
Canonicalization strips everything comes after the first dot (.), unless
the function attribute "sample-profile-suffix-elision-policy" is set to
"selected", in which case it only strips after the known suffixes. The
helper function names have the suffixes like `.omp_outlined`. After
canonicalization, the name becomes the same as the original function.
Add the attribute to helper functions so that the suffixes are not
stripped.
This is the same fix applied previously to coroutine await suspend
wrapper functions (#174881).
This function is needed in identical form for CIR codegen, and pulling
it out into AsmStmt is effectively trivial. The only thing that actually
needs the codegen in it is the ability to diagnose, so this patch adds
that as a callback. AsmStmt seems to be the most logical place for this
to happen, as it does other similar things. Howver, unlike the other
similar things, th is is the same between MS and GCC, so it doesn't need
separate implementations.
This reverts commit b1e511bf5a4c702ace445848b30070ac2e021241.
https://github.com/llvm/llvm-project/issues/160243
Reverting because the GCC C front end is incorrect.
---------
Co-authored-by: Jim Lin <jim@andestech.com>
Added Support for flag output operand "=@cc", inline assembly constraint
for
SystemZ.
- Clang now accepts "=@cc" assembly operands, and sets 2-bits condition
code
for output operand for SyatemZ.
- Clang currently emits an assertion that flag output operands are
boolean
values, i.e. in the range [0, 2). Generalize this mechanism to allow
targets to specify arbitrary range assertions for any inline assembly
output operand. This will be used to assert that SystemZ two-bit
condition-code values are in the range [0, 4).
- SystemZ backend lowers "@cc" targets by using ipm sequence to extract
condition code from PSW.
- DAGCombine tries to optimize lowered ipm sequence by combining
CCReg and computing effective CCMask and CCValid in combineCCMask for
select_ccmask and br_ccmask.
- Cost computation is done for merging conditionals for branch
instruction
in SelectionDAG, as split may cause branches conditions evaluation goes
across basic block and difficult to combine.
---------
Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
This change implements the fuse directive, `#pragma omp fuse`, as specified in the OpenMP 6.0, along with the `looprange` clause in clang.
This change also adds minimal stubs so flang keeps compiling (a full implementation in flang of this directive is still pending).
---------
Co-authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
In C89, for-init variables have function scope, so cleanup should occur
at function exit, not loop exit. This implements deferred cleanup
registration for C89 mode while preserving C99+ behavior.
Fixes#154624
This implements support for [named
loops](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3355.htm) for
C2y.
When parsing a `LabelStmt`, we create the `LabeDecl` early before we parse
the substatement; this label is then passed down to `ParseWhileStatement()`
and friends, which then store it in the loop’s (or switch statement’s) `Scope`;
when we encounter a `break/continue` statement, we perform a lookup for
the label (and error if it doesn’t exist), and then walk the scope stack and
check if there is a scope whose preceding label is the target label, which
identifies the jump target.
The feature is only supported in C2y mode, though a cc1-only option
exists for testing (`-fnamed-loops`), which is mostly intended to try
and make sure that we don’t have to refactor this entire implementation
when/if we start supporting it in C++.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
When returning a value, stores to the `retval` allocas and branches to `return`
block are put in the same atom group. They are both rank 1, which could in
theory introduce an extra step in some optimized code. This low risk currently
feels an acceptable for keeping the code a bit simpler (as opposed to adding
scaffolding to make the store rank 2).
In the case of a single return (no control flow) the return instruction inherits
the atom group of the branch to the return block when the blocks get folded
togather.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
The InstrProf headers are very expensive. Avoid including them in all of
CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr.
This reduces clang build time by 0.8%.
See test comment for possible future improvement.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
See test comment for possible future improvement.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This implements the R2 semantics of P0963.
The R1 semantics, as outlined in the paper, were introduced in Clang 6.
In addition to that, the paper proposes swapping the evaluation order of
condition expressions and the initialization of binding declarations
(i.e. std::tuple-like decompositions).
This statement level construct takes no clauses and has no associated
statement, and simply labels a number of array elements as valid for
caching. The implementation here is pretty simple, but it is a touch of
a special case for parsing, so the parsing code reflects that.
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.
The RFC for this attribute and option is
https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.
This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.
In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.
During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.
The atomic construct is a particularly complicated one. The directive
itself is pretty simple, it has 5 options for the 'atomic-clause'.
However, the associated statement is fairly complicated.
'read' accepts:
v = x;
'write' accepts:
x = expr;
'update' (or no clause) accepts:
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
x = expr binop x;
'capture' accepts either a compound statement, or:
v = x++;
v = x--;
v = ++x;
v = --x;
v = x binop= expr;
v = x = x binop expr;
v = x = expr binop x;
IF 'capture' has a compound statement, it accepts:
{v = x; x binop= expr; }
{x binop= expr; v = x; }
{v = x; x = x binop expr; }
{v = x; x = expr binop x; }
{x = x binop expr ;v = x; }
{x = expr binop x; v = x; }
{v = x; x = expr; }
{v = x; x++; }
{v = x; ++x; }
{x++; v = x; }
{++x; v = x; }
{v = x; x--; }
{v = x; --x; }
{x--; v = x; }
{--x; v = x; }
While these are all quite complicated, there is a significant amount
of similarity between the 'capture' and 'update' lists, so this patch
reuses a lot of the same functions.
This patch implements the entirety of 'atomic', creating a new Sema file
for the sema for it, as it is fairly sizable.
A SYCL kernel entry point function is a non-member function or a static
member function declared with the `sycl_kernel_entry_point` attribute.
Such functions define a pattern for an offload kernel entry point
function to be generated to enable execution of a SYCL kernel on a
device. A SYCL library implementation orchestrates the invocation of
these functions with corresponding SYCL kernel arguments in response to
calls to SYCL kernel invocation functions specified by the SYCL 2020
specification.
The offload kernel entry point function (sometimes referred to as the
SYCL kernel caller function) is generated from the SYCL kernel entry
point function by a transformation of the function parameters followed
by a transformation of the function body to replace references to the
original parameters with references to the transformed ones. Exactly how
parameters are transformed will be explained in a future change that
implements non-trivial transformations. For now, it suffices to state
that a given parameter of the SYCL kernel entry point function may be
transformed to multiple parameters of the offload kernel entry point as
needed to satisfy offload kernel argument passing requirements.
Parameters that are decomposed in this way are reconstituted as local
variables in the body of the generated offload kernel entry point
function.
For example, given the following SYCL kernel entry point function
definition:
```
template<typename KernelNameType, typename KernelType>
[[clang::sycl_kernel_entry_point(KernelNameType)]]
void sycl_kernel_entry_point(KernelType kernel) {
kernel();
}
```
and the following call:
```
struct Kernel {
int dm1;
int dm2;
void operator()() const;
};
Kernel k;
sycl_kernel_entry_point<class kernel_name>(k);
```
the corresponding offload kernel entry point function that is generated
might look as follows (assuming `Kernel` is a type that requires
decomposition):
```
void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) {
Kernel kernel{dm1, dm2};
kernel();
}
```
Other details of the generated offload kernel entry point function, such
as its name and calling convention, are implementation details that need
not be reflected in the AST and may differ across target devices. For
that reason, only the transformation described above is represented in
the AST; other details will be filled in during code generation.
These transformations are represented using new AST nodes introduced
with this change. `OutlinedFunctionDecl` holds a sequence of
`ImplicitParamDecl` nodes and a sequence of statement nodes that
correspond to the transformed parameters and function body.
`SYCLKernelCallStmt` wraps the original function body and associates it
with an `OutlinedFunctionDecl` instance. For the example above, the AST
generated for the `sycl_kernel_entry_point<kernel_name>` specialization
would look as follows:
```
FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)'
TemplateArgument type 'kernel_name'
TemplateArgument type 'Kernel'
ParmVarDecl kernel 'Kernel'
SYCLKernelCallStmt
CompoundStmt
<original statements>
OutlinedFunctionDecl
ImplicitParamDecl 'dm1' 'int'
ImplicitParamDecl 'dm2' 'int'
CompoundStmt
VarDecl 'kernel' 'Kernel'
<initialization of 'kernel' with 'dm1' and 'dm2'>
<transformed statements with redirected references of 'kernel'>
```
Any ODR-use of the SYCL kernel entry point function will (with future
changes) suffice for the offload kernel entry point to be emitted. An
actual call to the SYCL kernel entry point function will result in a
call to the function. However, evaluation of a `SYCLKernelCallStmt`
statement is a no-op, so such calls will have no effect other than to
trigger emission of the offload kernel entry point.
Additionally, as a related change inspired by code review feedback,
these changes disallow use of the `sycl_kernel_entry_point` attribute
with functions defined with a _function-try-block_. The SYCL 2020
specification prohibits the use of C++ exceptions in device functions.
Even if exceptions were not prohibited, it is unclear what the semantics
would be for an exception that escapes the SYCL kernel entry point
function; the boundary between host and device code could be an implicit
noexcept boundary that results in program termination if violated, or
the exception could perhaps be propagated to host code via the SYCL
library. Pending support for C++ exceptions in device code and clear
semantics for handling them at the host-device boundary, this change
makes use of the `sycl_kernel_entry_point` attribute with a function
defined with a _function-try-block_ an error.
- Adding the changes from PRs:
- #116331
- #121852
- Fixes test `tools/dxil-dis/debug-info.ll`
- Address some missed comments in the previous PR
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
`CounterPair` can hold `<uint32_t, uint32_t>` instead of current
`unsigned`, to hold also the counter number of SkipPath. For now, this
change provides the skeleton and only `CounterPair::Executed` is used.
Each counter number can have `None` to suppress emitting counter
increment. 2nd element `Skipped` is initialized as `None` by default,
since most `Stmt*` don't have a pair of counters.
This change also provides stubs for the verifier. I'll provide the impl
of verifier for `+Asserts` later.
`markStmtAsUsed(bool, Stmt*)` may be used to inform that other side
counter may not emitted.
`markStmtMaybeUsed(S)` may be used for the `Stmt` and its inner will be
excluded for emission in the case of skipping by constant folding. I put
it into places where I found.
`verifyCounterMap()` will check the coverage map and the counter map,
and can be used to report inconsistency.
These verifier methods shall be eliminated in `-Asserts`.
https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492
This executable construct has a larger list of clauses than some of the
others, plus has some additional restrictions. This patch implements
the AST node, plus the 'cannot be the body of a if, while, do, switch,
or label' statement restriction. Future patches will handle the
rest of the restrictions, which are based on clauses.
The 'set' construct is another fairly simple one, it doesn't have an
associated statement and only a handful of allowed clauses. This patch
implements it and all the rules for it, allowing 3 of its for clauses.
The only exception is default_async, which will be implemented in a
future patch, because it isn't just being enabled, it needs a complete
new implementation.
- adding Flatten and Branch to if stmt.
- adding dxil control flow hint metadata generation
- modifing spirv OpSelectMerge to account for the specific attributes.
Closes#70112
---------
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented. This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.