765 Commits

Author SHA1 Message Date
Tom Honermann
23e4fe040b
[SYCL] SYCL host kernel launch support for the sycl_kernel_entry_point attribute. (#152403)
The `sycl_kernel_entry_point` attribute facilitates the generation of an
offload kernel entry point function based on the parameters and body
of the attributed function. This change extends the behavior of that
attribute to support integration with a SYCL runtime library through
an interface that communicates symbol names and kernel arguments
for the generated offload kernel entry point functions.

Consider the following function declared with the
`sycl_kernel_entry_point` attribute with a call to this function
occurring in the implementation of a SYCL kernel invocation function
such as `sycl::handler::single_task()`.
```c++
  template<typename KernelName, typename KernelType>
  [[clang::sycl_kernel_entry_point(KernelName)]]
  void kernel_entry_point(KernelType kernel) {
    kernel();
  }
```

The body of the above function specifies the parameters and body of the
generated offload kernel entry point. Clearly, a call to the above
function by a SYCL kernel invocation function is not intended to execute
the body as written. Previously, code generation emitted an empty
function body so that calls to the function had no effect other than to
trigger the generation of the offload kernel entry point. The function
body is therefore available to hook for SYCL library support and is now
substituted with a call to a (SYCL library provided) function template
or variable template named `sycl_kernel_launch()` with the kernel
name type passed as the first template argument, the symbol name
of the offload kernel entry point passed as a string literal for the first
function argument, and the function parameters passed as the
remaining explicit function arguments. Given a call like this:
```c++
  kernel_entry_point<struct KN>([]{})
```
the body of the instantiated `kernel_entry_point()` specialization would
be substituted as follows with "kernel-symbol-name" substituted for the
generated symbol name and `kernel` forwarded.
```c++
  sycl_kernel_launch<KN>("kernel-symbol-name", kernel)
```

Name lookup and overload resolution for the `sycl_kernel_launch()`
function is performed at the point of definition of the
`sycl_kernel_entry_point` attributed function (or the point of
instantiation for an instantiated function template specialization). If
overload resolution fails, the program is ill-formed.

Implementation of the `sycl_kernel_launch()` function might require
additional information provided by the SYCL library. This is facilitated
by removing the previous prohibition against use of the
`sycl_kernel_entry_point` attribute with a non-static member function.
If the `sycl_kernel_entry_point` attributed function is a non-static
member function, then overload resolution for the `sycl_kernel_launch()`
function template may select a non-static member function in which case,
`this` will be implicitly passed as the implicit object argument.

If a `sycl_kernel_entry_point` attributed function is a non-static
member function, use of `this` in a potentially evaluated expression is
prohibited in the definition since `this` is not a kernel argument and
will not be available within the generated offload kernel entry point
function. The attribute cannot be applied to a function with an
explicit object parameter.

---------

Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
2026-03-05 19:16:03 -05:00
Nikita Popov
c4721872af Revert "[Clang][inlineasm] Add special support for "rm" output constraints (#92040)"
This change landed without approval.

This reverts commit 45e666a8531c1148bdb170b9a120f99e1500c427.
This reverts commit a636dd4c37f12594275de2fe180ca35bc04d76ea.
2026-02-14 15:59:04 +01:00
Bill Wendling
45e666a853
[Clang][inlineasm] Add special support for "rm" output constraints (#92040)
Clang isn't able to support multiple constraints on inputs and outputs,
like "rm". Instead, it picks the "safest" one to use, i.e. the memory
constraint for "rm". This leads to obviously horrible code:

  asm __volatile__ ("pushf\n\t"
                    "popq %0"
                    : "=rm" (x));

is compiled to:

        pushf
	popq -8(%rsp)
	movq	-8(%rsp), %rax

It gets worse when inlined into other functions, because it may
introduce
a stack where none is needed.

With this change, Clang now generates IR for the more optimistic choice
("r"). All but the fast register allocator are able to fold registers if
it turns out that register pressure is too high.

This leaves the fast register allocator. The fast register allocator, as
the name suggests, is built for execution speed, not code quality. Thus,
we add special processing to convert the "optimistic" IR into the
"conservative" choice (again at the IR level), which we know it can
handle.

We focus on "rm" for the initial commit, but that can be expanded in the
future for other constraints where Clang generates ++ungood code (like
"g").

Fixes: https://github.com/llvm/llvm-project/issues/20571
2026-02-14 05:02:24 -08:00
Wei Wang
9dde0a803b
[SampleProf][OMP] Handle OMP helper function name canonicalization (#178339)
Fix an issue where `FunctionSamples::getCanonicalFnName` incorrectly
canonicalizes omp helper functions to collide with the original function
itself. This causes the sample loader to annotate the wrong functions.
Canonicalization strips everything comes after the first dot (.), unless
the function attribute "sample-profile-suffix-elision-policy" is set to
"selected", in which case it only strips after the known suffixes. The
helper function names have the suffixes like `.omp_outlined`. After
canonicalization, the name becomes the same as the original function.
Add the attribute to helper functions so that the suffixes are not
stripped.

This is the same fix applied previously to coroutine await suspend
wrapper functions (#174881).
2026-01-30 11:43:10 -08:00
NAKAMURA Takumi
f86fab6105
[Coverage][Single] Enable Branch coverage for IfStmt (#113111)
Depends on: #112730 #113114


https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492
2026-01-29 13:30:49 +09:00
NAKAMURA Takumi
ea509d2857
[Coverage][Single] Enable Branch coverage for SwitchStmt (#113112)
Depends on: #112730 #113114


https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492
2026-01-29 10:33:58 +09:00
NAKAMURA Takumi
599c2a0063
[Coverage][Single] Enable Branch coverage for loop statements (#113109)
Depends on: #112730 #113114


https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492
2026-01-29 07:46:19 +09:00
Erich Keane
04c83c3498
[NFCI] Extract out the addVariableConstraints CGASM Function (#175261)
This function is needed in identical form for CIR codegen, and pulling
it out into AsmStmt is effectively trivial. The only thing that actually
needs the codegen in it is the ability to diagnose, so this patch adds
that as a callback. AsmStmt seems to be the most logical place for this
to happen, as it does other similar things. Howver, unlike the other
similar things, th is is the same between MS and GCC, so it doesn't need
separate implementations.
2026-01-14 06:31:13 -08:00
Sirraide
71bfdd1304
[Clang] Add support for the C _Defer TS (#162848)
This implements WG14 N3734 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3734.pdf),
aka `_Defer`; it is currently only supported in C if `-fdefer-ts` is passed.
2025-12-11 05:54:09 +01:00
KaiWeng
d9c7c76269
Revert "Ignore trailing NullStmts in StmtExprs for GCC compatibility." (#166036)
This reverts commit b1e511bf5a4c702ace445848b30070ac2e021241.

https://github.com/llvm/llvm-project/issues/160243
Reverting because the GCC C front end is incorrect.

---------

Co-authored-by: Jim Lin <jim@andestech.com>
2025-11-07 09:30:53 -05:00
anoopkg6
6712e20c52
Add support for flag output operand "=@cc" for SystemZ. (#125970)
Added Support for flag output operand "=@cc", inline assembly constraint
for
SystemZ.

- Clang now accepts "=@cc" assembly operands, and sets 2-bits condition
code
    for output operand for SyatemZ.

- Clang currently emits an assertion that flag output operands are
boolean
values, i.e. in the range [0, 2). Generalize this mechanism to allow
targets to specify arbitrary range assertions for any inline assembly
    output operand.  This will be used to assert that SystemZ two-bit
    condition-code values are in the range [0, 4).

- SystemZ backend lowers "@cc" targets by using ipm sequence to extract
    condition code from PSW.

  - DAGCombine tries to optimize lowered ipm sequence by combining
CCReg and computing effective CCMask and CCValid in combineCCMask for
    select_ccmask and br_ccmask.

- Cost computation is done for merging conditionals for branch
instruction
in SelectionDAG, as split may cause branches conditions evaluation goes
    across basic block and difficult to combine.

---------

Co-authored-by: anoopkg6 <anoopkg6@github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
2025-10-14 11:53:42 +02:00
Walter J.T.V
cd4c5280c7
[Clang][OpenMP][LoopTransformations] Implement "#pragma omp fuse" loop transformation directive and "looprange" clause (#139293)
This change implements the fuse directive, `#pragma omp fuse`, as specified in the OpenMP 6.0, along with the `looprange` clause in clang.

This change also adds minimal stubs so flang keeps compiling (a full implementation in flang of this directive is still pending).

---------

Co-authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
2025-09-29 07:48:18 +02:00
Iris Shi
ddfbfd6b58
[NFC][clang] Move simplifyConstraint to TargetInfo.cpp (#154905)
Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
2025-09-28 10:07:27 +02:00
Jongmyeong Choi
60b3cc69af
[CodeGen] Fix cleanup attribute for C89 for-loop init variables (#156643)
In C89, for-init variables have function scope, so cleanup should occur
at function exit, not loop exit. This implements deferred cleanup
registration for C89 mode while preserving C99+ behavior.

Fixes #154624
2025-09-23 20:35:43 -07:00
Sirraide
e4a1b5f36e
[Clang] [C2y] Implement N3355 ‘Named Loops’ (#152870)
This implements support for [named
loops](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3355.htm) for
C2y. 

When parsing a `LabelStmt`, we create the `LabeDecl` early before we parse 
the substatement; this label is then passed down to `ParseWhileStatement()` 
and friends, which then store it in the loop’s (or switch statement’s) `Scope`; 
when we encounter a `break/continue` statement, we perform a lookup for 
the label (and error if it doesn’t exist), and then walk the scope stack and 
check if there is a scope whose preceding label is the target label, which 
identifies the jump target.

The feature is only supported in C2y mode, though a cc1-only option
exists for testing (`-fnamed-loops`), which is mostly intended to try
and make sure that we don’t have to refactor this entire implementation
when/if we start supporting it in C++.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2025-09-02 16:37:19 +00:00
Matheus Izvekov
91cdd35008
[clang] Improve nested name specifier AST representation (#147835)
This is a major change on how we represent nested name qualifications in
the AST.

* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.

This has great results on compile-time-tracker as well:

![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831)

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.

The rest and bulk of the changes are mostly consequences of the changes
in API.

PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
2025-08-09 05:06:53 -03:00
Orlando Cazalet-Hyams
bbe912f1e7
[KeyInstr] Inline asm atoms (#149076) 2025-07-22 17:19:58 +01:00
Orlando Cazalet-Hyams
5c7c8558c8
[KeyInstr] goto stmt atoms (#149101) 2025-07-21 11:09:40 +01:00
Kazu Hirata
ae372bfca8
[CodeGen] Use range-based for loops (NFC) (#145142) 2025-06-21 08:20:57 -07:00
Orlando Cazalet-Hyams
54d544b831
[KeyInstr][Clang] Ret atom (#134652)
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

When returning a value, stores to the `retval` allocas and branches to `return`
block are put in the same atom group. They are both rank 1, which could in
theory introduce an extra step in some optimized code. This low risk currently
feels an acceptable for keeping the code a bit simpler (as opposed to adding
scaffolding to make the store rank 2).

In the case of a single return (no control flow) the return instruction inherits
the atom group of the branch to the return block when the blocks get folded
togather.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
2025-06-04 15:43:49 +01:00
Orlando Cazalet-Hyams
ac42923c2d Reapply "[KeyInstr][Clang] For range stmt atoms" (#142630)
This reverts commit e6529dcedb3955706a8af5710591f1ac1bac26a3 with crash fixed.

Original PR https://github.com/llvm/llvm-project/pull/134647

This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
2025-06-04 10:53:29 +01:00
Orlando Cazalet-Hyams
e6529dcedb
Revert "[KeyInstr][Clang] For range stmt atoms" (#142630)
Reverts llvm/llvm-project#134647

Bot failure:

https://lab.llvm.org/buildbot/#/builders/144/builds/26730/steps/6/logs/FAIL__Clang__terminate-statements_cpp
2025-06-03 16:15:46 +01:00
Orlando Cazalet-Hyams
10024363dd
[KeyInstr][Clang] For range stmt atoms (#134647)
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
2025-06-03 15:44:15 +01:00
Orlando Cazalet-Hyams
8e50e882a8 [KeyInstr][Clang] Break and Continue stmt atoms
[KeyInstr][Clang] For stmt atom (#134646)
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
2025-06-03 14:25:48 +01:00
Orlando Cazalet-Hyams
0555594195
[KeyInstr][Clang] For stmt atom (#134646)
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
2025-06-03 13:47:32 +01:00
Nikita Popov
e2b536431d
[CodeGen] Move CodeGenPGO behind unique_ptr (NFC) (#142155)
The InstrProf headers are very expensive. Avoid including them in all of
CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr.

This reduces clang build time by 0.8%.
2025-06-02 09:51:54 +02:00
Orlando Cazalet-Hyams
dd8eb1e673
[KeyInstr][Clang] Switch stmt atom (#134643)
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
2025-05-27 11:26:40 +01:00
Orlando Cazalet-Hyams
6bd3543a4d
[KeyInstr][Clang] While stmt atom (#134645)
See test comment for possible future improvement.

This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
2025-05-23 14:42:28 +01:00
Orlando Cazalet-Hyams
189d5dad36
[KeyInstr][Clang] Do stmt atom (#134644)
See test comment for possible future improvement.

This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
2025-05-23 14:31:18 +01:00
joaosaffran
567b0f8923
[HLSL] Add support to branch/flatten attributes to switch (#131739)
closes: [#125754](https://github.com/llvm/llvm-project/issues/125754)

---------

Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
2025-03-24 16:17:19 -07:00
cor3ntin
911b200ce3
[Clang] Constant Expressions inside of GCC' asm strings (#131003)
Implements GCC's constexpr string ASM extension
https://gcc.gnu.org/onlinedocs/gcc/Asm-constexprs.html
2025-03-17 20:10:46 +01:00
Younan Zhang
f4218753ad
[Clang] Implement P0963R3 "Structured binding declaration as a condition" (#130228)
This implements the R2 semantics of P0963.

The R1 semantics, as outlined in the paper, were introduced in Clang 6.
In addition to that, the paper proposes swapping the evaluation order of
condition expressions and the initialization of binding declarations
(i.e. std::tuple-like decompositions).
2025-03-11 15:41:56 +08:00
erichkeane
d5cec386c1 [OpenACC] Implement 'cache' construct AST/Sema
This statement level construct takes no clauses and has no associated
statement, and simply labels a number of array elements as valid for
caching. The implementation here is pretty simple, but it is a touch of
a special case for parsing, so the parsing code reflects that.
2025-03-03 13:57:23 -08:00
Yaxun (Sam) Liu
240f2269ff
Add clang atomic control options and attribute (#114841)
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.

The RFC for this attribute and option is

https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.

This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.

In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
 `-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.

In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.

During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.
2025-02-27 10:41:04 -05:00
Zahira Ammarguellat
cf69b4c668
[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#126927)
This patch was reviewed and approved here:
https://github.com/llvm/llvm-project/pull/119891
However it has been reverted here:
083df25dc2
due to a build issue here:
https://lab.llvm.org/buildbot/#/builders/51/builds/10694

This patch is reintroducing the support.
2025-02-13 07:14:36 -05:00
Sameer Sahasrabuddhe
b85e71b9f2
[llvm] Create() functions for ConvergenceControlInst (#125627) 2025-02-05 11:41:26 +05:30
erichkeane
99a9133a68 [OpenACC] Implement Sema/AST for 'atomic' construct
The atomic construct is a particularly complicated one.  The directive
itself is pretty simple, it has 5 options for the 'atomic-clause'.
However, the associated statement is fairly complicated.

'read' accepts:
  v = x;
'write' accepts:
  x = expr;
'update' (or no clause) accepts:
  x++;
  x--;
  ++x;
  --x;
  x binop= expr;
  x = x binop expr;
  x = expr binop x;

'capture' accepts either a compound statement, or:
  v = x++;
  v = x--;
  v = ++x;
  v = --x;
  v = x binop= expr;
  v = x = x binop expr;
  v = x = expr binop x;

IF 'capture' has a compound statement, it accepts:
  {v = x; x binop= expr; }
  {x binop= expr; v = x; }
  {v = x; x = x binop expr; }
  {v = x; x = expr binop x; }
  {x = x binop expr ;v = x; }
  {x = expr binop x; v = x; }
  {v = x; x = expr; }
  {v = x; x++; }
  {v = x; ++x; }
  {x++; v = x; }
  {++x; v = x; }
  {v = x; x--; }
  {v = x; --x; }
  {x--; v = x; }
  {--x; v = x; }

While these are all quite complicated, there is a significant amount
of similarity between the 'capture' and 'update' lists, so this patch
reuses a lot of the same functions.

This patch implements the entirety of 'atomic', creating a new Sema file
for the sema for it, as it is fairly sizable.
2025-02-03 07:22:22 -08:00
Tom Honermann
8fb42300a0
[SYCL] AST support for SYCL kernel entry point functions. (#122379)
A SYCL kernel entry point function is a non-member function or a static
member function declared with the `sycl_kernel_entry_point` attribute.
Such functions define a pattern for an offload kernel entry point
function to be generated to enable execution of a SYCL kernel on a
device. A SYCL library implementation orchestrates the invocation of
these functions with corresponding SYCL kernel arguments in response to
calls to SYCL kernel invocation functions specified by the SYCL 2020
specification.

The offload kernel entry point function (sometimes referred to as the
SYCL kernel caller function) is generated from the SYCL kernel entry
point function by a transformation of the function parameters followed
by a transformation of the function body to replace references to the
original parameters with references to the transformed ones. Exactly how
parameters are transformed will be explained in a future change that
implements non-trivial transformations. For now, it suffices to state
that a given parameter of the SYCL kernel entry point function may be
transformed to multiple parameters of the offload kernel entry point as
needed to satisfy offload kernel argument passing requirements.
Parameters that are decomposed in this way are reconstituted as local
variables in the body of the generated offload kernel entry point
function.

For example, given the following SYCL kernel entry point function
definition:
```
template<typename KernelNameType, typename KernelType>
[[clang::sycl_kernel_entry_point(KernelNameType)]]
void sycl_kernel_entry_point(KernelType kernel) {
  kernel();
}
```

and the following call:
```
struct Kernel {
  int dm1;
  int dm2;
  void operator()() const;
};
Kernel k;
sycl_kernel_entry_point<class kernel_name>(k);
```

the corresponding offload kernel entry point function that is generated
might look as follows (assuming `Kernel` is a type that requires
decomposition):
```
void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) {
  Kernel kernel{dm1, dm2};
  kernel();
}
```

Other details of the generated offload kernel entry point function, such
as its name and calling convention, are implementation details that need
not be reflected in the AST and may differ across target devices. For
that reason, only the transformation described above is represented in
the AST; other details will be filled in during code generation.

These transformations are represented using new AST nodes introduced
with this change. `OutlinedFunctionDecl` holds a sequence of
`ImplicitParamDecl` nodes and a sequence of statement nodes that
correspond to the transformed parameters and function body.
`SYCLKernelCallStmt` wraps the original function body and associates it
with an `OutlinedFunctionDecl` instance. For the example above, the AST
generated for the `sycl_kernel_entry_point<kernel_name>` specialization
would look as follows:
```
FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)'
  TemplateArgument type 'kernel_name'
  TemplateArgument type 'Kernel'
  ParmVarDecl kernel 'Kernel'
  SYCLKernelCallStmt
    CompoundStmt
      <original statements>
    OutlinedFunctionDecl
      ImplicitParamDecl 'dm1' 'int'
      ImplicitParamDecl 'dm2' 'int'
      CompoundStmt
        VarDecl 'kernel' 'Kernel'
          <initialization of 'kernel' with 'dm1' and 'dm2'>
        <transformed statements with redirected references of 'kernel'>
```

Any ODR-use of the SYCL kernel entry point function will (with future
changes) suffice for the offload kernel entry point to be emitted. An
actual call to the SYCL kernel entry point function will result in a
call to the function. However, evaluation of a `SYCLKernelCallStmt`
statement is a no-op, so such calls will have no effect other than to
trigger emission of the offload kernel entry point.

Additionally, as a related change inspired by code review feedback,
these changes disallow use of the `sycl_kernel_entry_point` attribute
with functions defined with a _function-try-block_. The SYCL 2020
specification prohibits the use of C++ exceptions in device functions.
Even if exceptions were not prohibited, it is unclear what the semantics
would be for an exception that escapes the SYCL kernel entry point
function; the boundary between host and device code could be an implicit
noexcept boundary that results in program termination if violated, or
the exception could perhaps be propagated to host code via the SYCL
library. Pending support for C++ exceptions in device code and clear
semantics for handling them at the host-device boundary, this change
makes use of the `sycl_kernel_entry_point` attribute with a function
defined with a _function-try-block_ an error.
2025-01-22 16:39:08 -05:00
CHANDRA GHALE
30f9a4f754
[OpenMP] codegen support for masked combined construct parallel masked taskloop simd. (#121746)
Added codegen support for combined masked constructs `Parallel masked
taskloop simd`.
Added implementation for `EmitOMPParallelMaskedTaskLoopSimdDirective`.

Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
2025-01-14 18:26:46 +05:30
joaosaffran
380bb51b70
[HLSL] Adding Flatten and Branch if attributes with test fixes (#122157)
- Adding the changes from PRs: 
  - #116331 
  - #121852 
- Fixes test `tools/dxil-dis/debug-info.ll`
- Address some missed comments in the previous PR

---------

Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
2025-01-13 10:31:25 -08:00
CHANDRA GHALE
6f558e0e12
[OpenMP] codegen support for masked combined construct masked taskloop (#121914)
Added codegen support for combined masked constructs `masked taskloop.`
Added implementation for `EmitOMPMaskedTaskLoopDirective`.

---------

Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
2025-01-13 11:42:13 +05:30
CHANDRA GHALE
1d2eea962a
[OpenMP] codegen support for masked combined construct masked taskloop simd (#121916)
Added codegen support for combined masked constructs `masked taskloop
simd`.
Added implementation for `EmitOMPMaskedTaskLoopSimdDirective`.

Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
2025-01-12 23:38:00 +05:30
CHANDRA GHALE
aedb30fdc7
[OpenMP] codegen support for masked combined construct parallel masked taskloop (#121741)
Added codegen support for combined masked constructs Parallel masked
taskloop.
Added implementation for EmitOMPParallelMaskedTaskLoopDirective.

---------

Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
2025-01-09 16:38:36 +05:30
NAKAMURA Takumi
397ac44f62
[Coverage] Introduce the type CounterPair for RegionCounterMap. NFC. (#112724)
`CounterPair` can hold `<uint32_t, uint32_t>` instead of current
`unsigned`, to hold also the counter number of SkipPath. For now, this
change provides the skeleton and only `CounterPair::Executed` is used.

Each counter number can have `None` to suppress emitting counter
increment. 2nd element `Skipped` is initialized as `None` by default,
since most `Stmt*` don't have a pair of counters.

This change also provides stubs for the verifier. I'll provide the impl
of verifier for `+Asserts` later.

`markStmtAsUsed(bool, Stmt*)` may be used to inform that other side
counter may not emitted.

`markStmtMaybeUsed(S)` may be used for the `Stmt` and its inner will be
excluded for emission in the case of skipping by constant folding. I put
it into places where I found.

`verifyCounterMap()` will check the coverage map and the counter map,
and can be used to report inconsistency.

These verifier methods shall be eliminated in `-Asserts`.


https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492
2025-01-09 17:11:07 +09:00
Chris B
b66f6b25cb
Revert #116331 & #121852 (#122105) 2025-01-08 08:55:02 -06:00
erichkeane
db81e8c42e [OpenACC] Initial sema implementation of 'update' construct
This executable construct has a larger list of clauses than some of the
others, plus has some additional restrictions.  This patch implements
the AST node, plus the 'cannot be the body of a if, while, do, switch,
    or label' statement restriction.  Future patches will handle the
    rest of the restrictions, which are based on clauses.
2025-01-07 08:20:20 -08:00
erichkeane
21c785d7bd [OpenACC] Implement 'set' construct sema
The 'set' construct is another fairly simple one, it doesn't have an
associated statement and only a handful of allowed clauses. This patch
implements it and all the rules for it, allowing 3 of its for clauses.
The only exception is default_async, which will be implemented in a
future patch, because it isn't just being enabled, it needs a complete
new implementation.
2025-01-06 11:03:18 -08:00
joaosaffran
0d5c07285f
[HLSL] Adding Flatten and Branch if attributes (#116331)
- adding Flatten and Branch to if stmt.
- adding dxil control flow hint metadata generation
- modifing spirv OpSelectMerge to account for the specific attributes.

Closes #70112

---------

Co-authored-by: Joao Saffran <jderezende@microsoft.com>
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
2025-01-06 10:27:02 -08:00
Sameer Sahasrabuddhe
df67e37e37
[clang][NFC] clean up the handling of convergence control tokens (#121738) 2025-01-06 21:34:11 +05:30
erichkeane
4bbdb018a6 [OpenACC] Implement 'init' and 'shutdown' constructs
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented.  This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.
2024-12-19 12:21:50 -08:00