This patch restricts the triangular loop iteration count optimization to
only apply to nested triangular loops (`depth >= 2`), not first-level
triangular loops.
The optimization computes iterations as `(Upper - Lower + 1`) for
triangular loops where the inner loop bound depends on an outer loop
counter. However, this formula only works correctly for deeply nested
triangular dependencies: `k` depends on `j`, and `j` itself depends on
`i` For first-level triangular loops: `k` depends directly on `i`, the
standard iteration count formula handles the calculation correctly.
This patch updates the mapping kind for const-qualified variables
from`tofrom` to `to`, ensuring correct and standards-compliant mapping
semantics for const variables.
Parse and perform semantic checks for declare_target 'local' clause.
When compiling for device offloading, generate a warning that 'local' is
not yet fully supported. On the host, 'local' is/will be a no-op, so no
warning is generated.
NOTE: The minimal CodeGen changes allow 'local' to flow through as
equivalent to the 'enter' clause after warning is generated.
Testing:
- Updated messages and ast tests for declare target/declare_target
- ninja check-all.
Summary:
We use this `dyn_ptr` argument in Clang/OpenMP to handle the
`KernelLaunchEnvironment`. This is a per-kernel argument used to share
some information. Currenetly, it's prepended to the argument list and we
generate storage for it in the runtime.
This is bad for a few reasons:
1. It changes the ABI by shifting user arguments
2. It cannot be trivially be left uninitialized if unused
3. The runtime must allocate its own memory for it
This PR changes it to be appended instead. Additionally, space for this
is always emitted. This means the OMPIRBuilder itself will provide the
storage, we simply need to populate it in the runtime if it is used.
This means that if it's unused we don't always pay the cost and it's
easier for non-OpenMP users to ignore it.
Backward compatibility is maintained by auto-upgrading the kernel
arguments. In `libomptarget` we completely allocate a new buffer to
store this in the new format. The plugins still need to respect the old
ABI of the called device object, so we simply rotate it if it's the old
version.
This patch extends the `transparent` clause implementation to properly
handle runtime variable expressions as the `impex-type` argument, as
required by the OpenMP specification:
`"The use of a variable in an impex-type expression causes an implicit
reference to the variable in all enclosing constructs. The impex-type
expression is evaluated in the context outside of the construct on which
the clause appears."`
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
According to OpenMP 5.2 the num_teams clause should support a
lower-bound as modifier for its argument. This PR adds Parsing support
for the lower bound in num_teams clause.
According to the `OpenMP 6.0` specification (Section 17.9.6 511:18), if
the `impex-type` is not explicitly specified, the behavior defaults to
treating it as `omp_impex`. This patch ensures proper handling of the
`transparent` clause when no specific `impex-type` is provided.
Depends on #169603.
This is the `use_device_ptr` counterpart of #168905.
With OpenMP 6.1, a `fallback` modifier can be specified on the
`use_device_ptr` clause to control the behavior when a pointer lookup
fails, i.e. there is no device pointer to translate into.
The default is `fb_preserve` (i.e. retain the original pointer), while
`fb_nullify` means: use `nullptr` as the translated pointer.
Dependent PR: #173930.
This adds support for using `ATTACH` map-type for proper
pointer-attachment when mapping list-items that have base-pointers.
For example, for the following:
```c
int *p;
#pragma omp target enter data map(p[1:10])
```
The following maps are now emitted by clang:
```
(A)
&p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
&p, &p[1], sizeof(p), ATTACH
```
Previously, the two possible maps emitted by clang were:
```
(B)
&p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
(C)
&p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ
````
(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.
-----
With this change, we are using ATTACH-style maps, like `(A)`, for cases
where the expression has a base-pointer. For example:
```cpp
int *p, **pp;
S *ps, **pps;
... map(p[0])
... map(p[10:20])
... map(*p)
... map(([20])p)
... map(ps->a)
... map(pps->p->a)
... map(pp[0][0])
... map(*(pp + 10)[0])
```
#### Grouping of maps based on attach base-pointers
We also group mapping of clauses with the same base decl in the order of
the increasing complexity of their base-pointers, e.g. for something
like:
```
S **spp;
map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
map(spp[0]), // attach-ptr: spp
map(spp), // attach-ptr: N/A
```
We first map `spp`, then `spp[0]` then `spp[0][0]` and `spp[0][0].a`.
This allows us to also group "struct" allocation based on their attach
pointers. This resolves the issues of us always mapping everything from
the beginning of the symbol `spp`. Each group is mapped independently,
and at the same level, like `spp[0][0]` and its member `spp[0][0].a`, we
still get map them together as part of the same contiguous struct
`spp[0][0]`. This resolves issue #141042.
#### use_device_ptr/addr fixes
The handling of `use_device_ptr/addr` was updated to use the attach-ptr
information, and works for many cases that were failing before. It has
to be done as part of this series because otherwise, the switch from
ptr_to_obj to attach-style mapping would have caused regressions in
existing use_device_ptr/addr tests.
#### Handling of attach-pointers that are members of implicitly mapped
structs:
* When a struct member-pointer, like `p` below, is a base-pointer in a
`map` clause on a target construct (like `map(p[0:1])`, and the base of
that struct is either the `this` pointer (implicitly or explicitly), or
a struct that is implicitly mapped on that construct, we add an implicit
`map(p)` so that we don't implicitly map the full struct.
```c
struct S { int *p;
void f1() {
#pragma omp target map(p[0:1]) // Implicitly map this->p, to ensure
// that the implicit map of `this[:]` does
// not map the full struct
printf("%p %p\n", &p, p);
}
```
#### Scope for improvement:
* We may be able to compute attach-ptr expr while collecting
component-lists in Sema.
* But we cache the computation results already, and `findAttachPtrExpr`
is fairly simple, and fast.
* There may be a better way to implement semantic expr comparison.
#### Needs future work:
* Attach-style maps not yet emitted for declare mappers.
* Mapping of class member references: We are still using PTR_AND_OBJ
maps for them. We will likely need to change that to handle
`ref_ptr/ref_ptee`, and `attach` map-type-modifier on them.
* Implicit capturing of "this" needs to map the full `this[0:1]` unless
there is an explicit map on one of the members, or a map with a member
as its base-pointer.
* Implicit map added for capturing a class member pointer needs to also
add a zero-length-array-section map.
* `use_device_addr` on array-sections-on-pointers need further
improvements (documented using FIXMEs)
#### Why a large PR
While it's unfortunate that this PR has gotten large and difficult to
review, the issue is that all the functional changes have to be made
together, to prevent regressions from partially implemented changes.
For example, the changes to capturing were previously done separately
(#145454), but they would still cause stability issues in absence of
full attach-mapping. And attach-mapping needs those changes to be able
to launch kernels.
We extracted the utilities and functions, like those for finding
attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't
call those functions, just adds them (#155625). Maybe the change that
adds a new error message for use_device_addr on array-sections with
non-var base-pointers could have been extracted out too (but that would
have had to be a follow-up change in that case, and we would get
comp-fails with this PR when the erroneous case was not
caught/diagnosed).
---------
Co-authored-by: Alex Duran <alejandro.duran@intel.com>
As described in section 2.14.6 of openmp spec, the patch implements
support for iterator in motion clauses.
---------
Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>
Same changes as in fix for
https://github.com/llvm/llvm-project/pull/165276 except for changes in
test case :
1) remove unnecessary include in test to restore Ubuntu build.
This is not needed as allocatable modifier is not applicable to the
default clause in C/C++.
2) Changes in CHECK statements to accommodate testing failure on
toolchain
builders at Google, Reported by Prabhu Rajasekaran.
---------
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
Same changes as in fix for
[165276](https://github.com/llvm/llvm-project/pull/165276) except for
remove unnecessary <vector> include in test to restore Ubuntu build.
This is not needed as allocatable modifier is not applicable to the
default clause in C/C++.
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
In the default clause taking care of new comments in the previous
"Support for Default clause variable category"
[157063](https://github.com/llvm/llvm-project/pull/157063) and adding a
new test case.
---------
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
The variable-category 'allocatable' is explicitly noted as applying only
to Fortran. If specified in C/C++ it should generate an error. NOTE:
Issue will be filed against OpenMP 6.0 specification that restriction is
missing from 'default' clause section.
From the OpenMP 6.0 specification:
Section 7.5.1 default Clause
Semantics, under Fortran only, L18-19, pg. 223
The allocatable variable-category specifies variables with the
ALLOCATABLE
attribute.
Section 7.9.9 defaultmap Clause
Semantics, under Fortran only, L9-10, pg. 292
The allocatable variable-category specifies variables with the
ALLOCATABLE
attribute.
Restrictions, C/C++
L1, pg. 293
The specified variable-category must not be allocatable.
Per OpenMP 6.0 specification, section 7.5.1, default Clause
Page 224, lines 3-5 default Clause, Semantics
If data-sharing-attribute is shared then the clause has no effect
on a target construct; otherwise, its effect on a target construct is
equivalent to specifying the defaultmap clause with the same
data-sharing-attribute and variable-category.
Testing:
OpenMP LIT tests
check-all
This PR adds support for the `dyn_groupprivate` clause, which will be
part of OpenMP 6.1. This feature allows users to request dynamic shared
memory on target regions.
---------
Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
This ammends the fix commited in https://reviews.llvm.org/D109770 /
6cf6fa6ef1c28
Comparing the number of template parameter lists with the number of
template parameters is obviously wrong.
Even then, the number of parameters being the same doesn't mean the
templates are compatible.
This change compares if the template parameters are actually equivalent.
This fixes the crash, but I am not sure what is the design and intention
here, this openmp template support looks too fragile.
The added test case still doesn't work, but at least we don't crash now.
This PR enhances the OpenMP `nowait` clause implementation by adding
support for optional argument in both parsing and semantic analysis
phases.
Reference:
1. OpenMP 6.0 Specification, page 481
Changes to support for array elements in reduction clause e.g.
"reduction (+:a[1])"
---------
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
This change implements the fuse directive, `#pragma omp fuse`, as specified in the OpenMP 6.0, along with the `looprange` clause in clang.
This change also adds minimal stubs so flang keeps compiling (a full implementation in flang of this directive is still pending).
---------
Co-authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
This patch updates the parsing changes to handle the new syntax of the
`uses_allocators` clause as defined in OpenMP 5.2(Section 6.8).
```
// Case 1: Allocator without traits
// < 5.2 → error
// ≥ 5.2 → OK, empty traits set
#pragma omp target teams uses_allocators(cgroup_alloc)
// Case 2: Allocator with traits
// Old syntax (< 5.2):
#pragma omp target teams uses_allocators(cgroup_alloc(cgroup_traits))
// New syntax (≥ 5.2):
#pragma omp target teams uses_allocators(traits(cgroup_traits) : cgroup_alloc)
// Case 3: Multiple allocators
// Old syntax (< 5.2), comma-separated:
#pragma omp target teams uses_allocators(cgroup_alloc(cgroup_traits), aligned_alloc(aligned_traits))
// New syntax (≥ 5.2), semicolon-separated:
#pragma omp target teams uses_allocators(traits(cgroup_traits) : cgroup_alloc; traits(aligned_traits) : aligned_alloc)
```
---------
Co-authored-by: urvi-rav <urvi.rav@hpe.com>
Per OpenMP 6.0 specification, section 7.9.9
Argument keywords, page 291, L17
Semantics, page 292, L15-16
The behavior of 'private' should be described in the same manner as that
of 'firstprivate'
15 ... If implicit-behavior is firstprivate, 16 the attribute is a
data-sharing attribute of firstprivate.
Relevant OpenMP 6.0 issues
defaultmap clause new implicit-behavior 'private' should be documented
https://github.com/OpenMP/spec/issues/4571
Issue 4571: Add missing sentence about private to defaultmap
https://github.com/OpenMP/spec/pull/4577
Testing:
Updated 'defaultmap' error message and codegen LIT tests to verify
behavior of 'private' in OpenMP 6.0.
This is preparatory work for the implementation of `#pragma omp fuse` in
https://github.com/llvm/llvm-project/pull/139293
**Note**: this change builds on top of
https://github.com/llvm/llvm-project/pull/155848
This change adds an additional class to hold data that will be shared
between all loop transformations: those that apply to canonical loop
nests (the majority) and those that apply to canonical loop sequences
(`fuse` in OpenMP 6.0).
This class is not a statement by itself and its goal is to avoid having
to replicate information between classes.
Also simplfiy the way we handle the "generated loops" information as we
currently only need to know if it is zero or non-zero.
First of two patches split from original defaultmap PR:
https://github.com/llvm/llvm-project/pull/157767
Per OpenMP 6.0 specification, section 7.9.9
Argument keywords, page 291, L17
Additional information, page 291, L24-25
24 The value alloc may also be specified as implicit-behavior with
identical meaning to the value 25 storage.
Testing:
Updated 'defaultmap' error message and codegen LIT tests to verify
behavior in OpenMP 6.0.
This is preparatory work for the implementation of `#pragma omp fuse` in
https://github.com/llvm/llvm-project/pull/139293
Not all OpenMP loop transformations makes sense to make them inherit
from `OMPLoopBasedDirective`, in particular in OpenMP 6.0 'fuse' (to be
implemented later) is a transformation of a canonical loop sequence.
This change renames class `OMPLoopTransformationDirective` to
`OMPCanonicalLoopNestTransformationDirective` so we can reclaim that
name in a later change.
When a mapper is declared with an iterator variable inside the map
clause, it results in unintended behavior due to the iterator being
implicitly created but left uninitialized.
Testcase:
```
typedef struct myvec{
size_t len;
double *data;
} myvec_t;
#pragma omp declare mapper(id:myvec_t v) map( iterator( iterator_variable=0:v.len), tofrom: v.data[iterator_variable])
int main()
{
int errors = 0;
myvec_t s;
#pragma omp target map(mapper(id), to:s)
{
}
return 0;
}
```
The error we get while compiling this is:
```
/usr/lib64/gcc/x86_64-suse-linux/14/../../../../x86_64-suse-linux/bin/ld: /tmp/test-f70647.o: in function `.omp_mapper._ZTS5myvec.id':
test.cpp:(.text+0x21a): undefined reference to `iterator_variable'
/llvm-project/install/bin/clang-linker-wrapper: error: 'ld' failed
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```
This patch tries to fix this by initializing the iterator variable to a
null constant.
---------
Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>
According to the OpenMP 6.0 specification, array sections with no length
and unknown size are considered assumed-size arrays. As of pull request
https://github.com/llvm/llvm-project/pull/148048
these types of array sections are allowed and can be specified in
clauses that allow array sections as list items. However, only two
clauses explicitly allow array sections that are assumed-size arrays:
- 'map' and 'use_device_addr'.
The other clauses that accept array sections do not explicitly accept
assumed-size arrays:
- inclusive, exclusive, has_device_addr, in_reduction, task_reduction,
reduction These cases should generate an error. See OpenMP 6.0
specification section 7.4 List Item Privatization, Restrictions, p. 222,
L15
Assumed-size arrays must not be privatized
For OpenMP 6.0, function getPrivateItem() now checks for array section
list items that are assumed-size arrays and generates an error if they
are not allowed for the clause.
Testing
- Updated LIT tests for assumed-size array sections to ensure these
clauses generate an error: inclusive, exclusive, has_device_addr,
in_reduction, task_reduction, reduction and that this clause is accepted
(codegen test): use_device_addr
- check-all
- OpenMP_VV (sollve_vv)
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757