Depends on #169603.
This is the `use_device_ptr` counterpart of #168905.
With OpenMP 6.1, a `fallback` modifier can be specified on the
`use_device_ptr` clause to control the behavior when a pointer lookup
fails, i.e. there is no device pointer to translate into.
The default is `fb_preserve` (i.e. retain the original pointer), while
`fb_nullify` means: use `nullptr` as the translated pointer.
Dependent PR: #173930.
This adds support for using `ATTACH` map-type for proper
pointer-attachment when mapping list-items that have base-pointers.
For example, for the following:
```c
int *p;
#pragma omp target enter data map(p[1:10])
```
The following maps are now emitted by clang:
```
(A)
&p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
&p, &p[1], sizeof(p), ATTACH
```
Previously, the two possible maps emitted by clang were:
```
(B)
&p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
(C)
&p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ
````
(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.
-----
With this change, we are using ATTACH-style maps, like `(A)`, for cases
where the expression has a base-pointer. For example:
```cpp
int *p, **pp;
S *ps, **pps;
... map(p[0])
... map(p[10:20])
... map(*p)
... map(([20])p)
... map(ps->a)
... map(pps->p->a)
... map(pp[0][0])
... map(*(pp + 10)[0])
```
#### Grouping of maps based on attach base-pointers
We also group mapping of clauses with the same base decl in the order of
the increasing complexity of their base-pointers, e.g. for something
like:
```
S **spp;
map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
map(spp[0]), // attach-ptr: spp
map(spp), // attach-ptr: N/A
```
We first map `spp`, then `spp[0]` then `spp[0][0]` and `spp[0][0].a`.
This allows us to also group "struct" allocation based on their attach
pointers. This resolves the issues of us always mapping everything from
the beginning of the symbol `spp`. Each group is mapped independently,
and at the same level, like `spp[0][0]` and its member `spp[0][0].a`, we
still get map them together as part of the same contiguous struct
`spp[0][0]`. This resolves issue #141042.
#### use_device_ptr/addr fixes
The handling of `use_device_ptr/addr` was updated to use the attach-ptr
information, and works for many cases that were failing before. It has
to be done as part of this series because otherwise, the switch from
ptr_to_obj to attach-style mapping would have caused regressions in
existing use_device_ptr/addr tests.
#### Handling of attach-pointers that are members of implicitly mapped
structs:
* When a struct member-pointer, like `p` below, is a base-pointer in a
`map` clause on a target construct (like `map(p[0:1])`, and the base of
that struct is either the `this` pointer (implicitly or explicitly), or
a struct that is implicitly mapped on that construct, we add an implicit
`map(p)` so that we don't implicitly map the full struct.
```c
struct S { int *p;
void f1() {
#pragma omp target map(p[0:1]) // Implicitly map this->p, to ensure
// that the implicit map of `this[:]` does
// not map the full struct
printf("%p %p\n", &p, p);
}
```
#### Scope for improvement:
* We may be able to compute attach-ptr expr while collecting
component-lists in Sema.
* But we cache the computation results already, and `findAttachPtrExpr`
is fairly simple, and fast.
* There may be a better way to implement semantic expr comparison.
#### Needs future work:
* Attach-style maps not yet emitted for declare mappers.
* Mapping of class member references: We are still using PTR_AND_OBJ
maps for them. We will likely need to change that to handle
`ref_ptr/ref_ptee`, and `attach` map-type-modifier on them.
* Implicit capturing of "this" needs to map the full `this[0:1]` unless
there is an explicit map on one of the members, or a map with a member
as its base-pointer.
* Implicit map added for capturing a class member pointer needs to also
add a zero-length-array-section map.
* `use_device_addr` on array-sections-on-pointers need further
improvements (documented using FIXMEs)
#### Why a large PR
While it's unfortunate that this PR has gotten large and difficult to
review, the issue is that all the functional changes have to be made
together, to prevent regressions from partially implemented changes.
For example, the changes to capturing were previously done separately
(#145454), but they would still cause stability issues in absence of
full attach-mapping. And attach-mapping needs those changes to be able
to launch kernels.
We extracted the utilities and functions, like those for finding
attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't
call those functions, just adds them (#155625). Maybe the change that
adds a new error message for use_device_addr on array-sections with
non-var base-pointers could have been extracted out too (but that would
have had to be a follow-up change in that case, and we would get
comp-fails with this PR when the erroneous case was not
caught/diagnosed).
---------
Co-authored-by: Alex Duran <alejandro.duran@intel.com>
As described in section 2.14.6 of openmp spec, the patch implements
support for iterator in motion clauses.
---------
Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>
Same changes as in fix for
https://github.com/llvm/llvm-project/pull/165276 except for changes in
test case :
1) remove unnecessary include in test to restore Ubuntu build.
This is not needed as allocatable modifier is not applicable to the
default clause in C/C++.
2) Changes in CHECK statements to accommodate testing failure on
toolchain
builders at Google, Reported by Prabhu Rajasekaran.
---------
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
Same changes as in fix for
[165276](https://github.com/llvm/llvm-project/pull/165276) except for
remove unnecessary <vector> include in test to restore Ubuntu build.
This is not needed as allocatable modifier is not applicable to the
default clause in C/C++.
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
In the default clause taking care of new comments in the previous
"Support for Default clause variable category"
[157063](https://github.com/llvm/llvm-project/pull/157063) and adding a
new test case.
---------
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
The variable-category 'allocatable' is explicitly noted as applying only
to Fortran. If specified in C/C++ it should generate an error. NOTE:
Issue will be filed against OpenMP 6.0 specification that restriction is
missing from 'default' clause section.
From the OpenMP 6.0 specification:
Section 7.5.1 default Clause
Semantics, under Fortran only, L18-19, pg. 223
The allocatable variable-category specifies variables with the
ALLOCATABLE
attribute.
Section 7.9.9 defaultmap Clause
Semantics, under Fortran only, L9-10, pg. 292
The allocatable variable-category specifies variables with the
ALLOCATABLE
attribute.
Restrictions, C/C++
L1, pg. 293
The specified variable-category must not be allocatable.
Per OpenMP 6.0 specification, section 7.5.1, default Clause
Page 224, lines 3-5 default Clause, Semantics
If data-sharing-attribute is shared then the clause has no effect
on a target construct; otherwise, its effect on a target construct is
equivalent to specifying the defaultmap clause with the same
data-sharing-attribute and variable-category.
Testing:
OpenMP LIT tests
check-all
This PR adds support for the `dyn_groupprivate` clause, which will be
part of OpenMP 6.1. This feature allows users to request dynamic shared
memory on target regions.
---------
Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
This ammends the fix commited in https://reviews.llvm.org/D109770 /
6cf6fa6ef1c28
Comparing the number of template parameter lists with the number of
template parameters is obviously wrong.
Even then, the number of parameters being the same doesn't mean the
templates are compatible.
This change compares if the template parameters are actually equivalent.
This fixes the crash, but I am not sure what is the design and intention
here, this openmp template support looks too fragile.
The added test case still doesn't work, but at least we don't crash now.
This PR enhances the OpenMP `nowait` clause implementation by adding
support for optional argument in both parsing and semantic analysis
phases.
Reference:
1. OpenMP 6.0 Specification, page 481
Changes to support for array elements in reduction clause e.g.
"reduction (+:a[1])"
---------
Co-authored-by: Sunil Kuravinakop <kuravina@pe31.hpc.amslabs.hpecorp.net>
This change implements the fuse directive, `#pragma omp fuse`, as specified in the OpenMP 6.0, along with the `looprange` clause in clang.
This change also adds minimal stubs so flang keeps compiling (a full implementation in flang of this directive is still pending).
---------
Co-authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
This patch updates the parsing changes to handle the new syntax of the
`uses_allocators` clause as defined in OpenMP 5.2(Section 6.8).
```
// Case 1: Allocator without traits
// < 5.2 → error
// ≥ 5.2 → OK, empty traits set
#pragma omp target teams uses_allocators(cgroup_alloc)
// Case 2: Allocator with traits
// Old syntax (< 5.2):
#pragma omp target teams uses_allocators(cgroup_alloc(cgroup_traits))
// New syntax (≥ 5.2):
#pragma omp target teams uses_allocators(traits(cgroup_traits) : cgroup_alloc)
// Case 3: Multiple allocators
// Old syntax (< 5.2), comma-separated:
#pragma omp target teams uses_allocators(cgroup_alloc(cgroup_traits), aligned_alloc(aligned_traits))
// New syntax (≥ 5.2), semicolon-separated:
#pragma omp target teams uses_allocators(traits(cgroup_traits) : cgroup_alloc; traits(aligned_traits) : aligned_alloc)
```
---------
Co-authored-by: urvi-rav <urvi.rav@hpe.com>
Per OpenMP 6.0 specification, section 7.9.9
Argument keywords, page 291, L17
Semantics, page 292, L15-16
The behavior of 'private' should be described in the same manner as that
of 'firstprivate'
15 ... If implicit-behavior is firstprivate, 16 the attribute is a
data-sharing attribute of firstprivate.
Relevant OpenMP 6.0 issues
defaultmap clause new implicit-behavior 'private' should be documented
https://github.com/OpenMP/spec/issues/4571
Issue 4571: Add missing sentence about private to defaultmap
https://github.com/OpenMP/spec/pull/4577
Testing:
Updated 'defaultmap' error message and codegen LIT tests to verify
behavior of 'private' in OpenMP 6.0.
This is preparatory work for the implementation of `#pragma omp fuse` in
https://github.com/llvm/llvm-project/pull/139293
**Note**: this change builds on top of
https://github.com/llvm/llvm-project/pull/155848
This change adds an additional class to hold data that will be shared
between all loop transformations: those that apply to canonical loop
nests (the majority) and those that apply to canonical loop sequences
(`fuse` in OpenMP 6.0).
This class is not a statement by itself and its goal is to avoid having
to replicate information between classes.
Also simplfiy the way we handle the "generated loops" information as we
currently only need to know if it is zero or non-zero.
First of two patches split from original defaultmap PR:
https://github.com/llvm/llvm-project/pull/157767
Per OpenMP 6.0 specification, section 7.9.9
Argument keywords, page 291, L17
Additional information, page 291, L24-25
24 The value alloc may also be specified as implicit-behavior with
identical meaning to the value 25 storage.
Testing:
Updated 'defaultmap' error message and codegen LIT tests to verify
behavior in OpenMP 6.0.
This is preparatory work for the implementation of `#pragma omp fuse` in
https://github.com/llvm/llvm-project/pull/139293
Not all OpenMP loop transformations makes sense to make them inherit
from `OMPLoopBasedDirective`, in particular in OpenMP 6.0 'fuse' (to be
implemented later) is a transformation of a canonical loop sequence.
This change renames class `OMPLoopTransformationDirective` to
`OMPCanonicalLoopNestTransformationDirective` so we can reclaim that
name in a later change.
When a mapper is declared with an iterator variable inside the map
clause, it results in unintended behavior due to the iterator being
implicitly created but left uninitialized.
Testcase:
```
typedef struct myvec{
size_t len;
double *data;
} myvec_t;
#pragma omp declare mapper(id:myvec_t v) map( iterator( iterator_variable=0:v.len), tofrom: v.data[iterator_variable])
int main()
{
int errors = 0;
myvec_t s;
#pragma omp target map(mapper(id), to:s)
{
}
return 0;
}
```
The error we get while compiling this is:
```
/usr/lib64/gcc/x86_64-suse-linux/14/../../../../x86_64-suse-linux/bin/ld: /tmp/test-f70647.o: in function `.omp_mapper._ZTS5myvec.id':
test.cpp:(.text+0x21a): undefined reference to `iterator_variable'
/llvm-project/install/bin/clang-linker-wrapper: error: 'ld' failed
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```
This patch tries to fix this by initializing the iterator variable to a
null constant.
---------
Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>
According to the OpenMP 6.0 specification, array sections with no length
and unknown size are considered assumed-size arrays. As of pull request
https://github.com/llvm/llvm-project/pull/148048
these types of array sections are allowed and can be specified in
clauses that allow array sections as list items. However, only two
clauses explicitly allow array sections that are assumed-size arrays:
- 'map' and 'use_device_addr'.
The other clauses that accept array sections do not explicitly accept
assumed-size arrays:
- inclusive, exclusive, has_device_addr, in_reduction, task_reduction,
reduction These cases should generate an error. See OpenMP 6.0
specification section 7.4 List Item Privatization, Restrictions, p. 222,
L15
Assumed-size arrays must not be privatized
For OpenMP 6.0, function getPrivateItem() now checks for array section
list items that are assumed-size arrays and generates an error if they
are not allowed for the clause.
Testing
- Updated LIT tests for assumed-size array sections to ensure these
clauses generate an error: inclusive, exclusive, has_device_addr,
in_reduction, task_reduction, reduction and that this clause is accepted
(codegen test): use_device_addr
- check-all
- OpenMP_VV (sollve_vv)
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
In OpenMP 6.0 specification, section 5.2.5 Array Sections, page 166,
lines 28-28:
When the length is absent and the size of the dimension is not known,
the array section is an assumed-size array.
Testing
- Updated LIT test
- check-all
- OpenMP_VV (formerly sollve) test case
tests/6.0/target/test_target_assumed_array_size.c
If the need_device_addr adjust-op modifier is present, each list item
that appears in the clause must refer to an argument in the declaration
of the function variant that has a reference type.
Reference:
OpenMP 6.0 [Sec 9.6.2, page 332, line 31-33, adjust_args clause,
Restrictions]
Implement parsing and semantic analysis support for the message and
severity clauses that have been added to the parallel directive in
OpenMP 6.0, 12.1.
Implement parsing and semantic analysis support for the optional
'strict' modifier of the num_threads clause. This modifier has been
introduced in OpenMP 6.0, section 12.1.2.
Note: this is basically 1:1 https://reviews.llvm.org/D138328.
This patch is closely related to #139293 and addresses an existing issue
in the loop transformation codebase. Specifically, it corrects the
handling of the `NumGeneratedLoops` variable in
`OMPLoopTransformationDirective` AST nodes and its inheritors (such as
OMPUnrollDirective, OMPTileDirective, etc.).
Previously, this variable was inaccurately set for certain
transformations like reverse or tile. While this did not lead to
functional bugs, since the value was only checked to determine whether
it was greater than zero or equal to zero, the inconsistency could
introduce problems when supporting more complex directives in the
future.
This builds upon #101101 from @jyu2-git, which used compiler-generated
mappers when mapping an array-section of structs with members that have
user-defined default mappers.
Now we do the same when mapping arrays of structs.
Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.
Sample Test Case from OpenMP 6.0 Example
```
#include <assert.h>
#include <omp.h>
#define N 10
void do_red(int n, int *v, int &sum_v)
{
sum_v = 0; // sum_v is private
#pragma omp for reduction(original(private),+: sum_v)
for (int i = 0; i < n; i++)
{
sum_v += v[i];
}
}
int main(void)
{
int v[N];
for (int i = 0; i < N; i++)
v[i] = i;
#pragma omp parallel num_threads(4)
{
int s_v; // s_v is private
do_red(N, v, s_v);
assert(s_v == 45);
}
return 0;
}
```
Expected Codegen:
```
// A shared global/static variable is introduced for the reduction result.
// This variable is initialized (e.g., using memset or a UDR initializer)
// e.g., .omp.reduction.internal_private_var
// Barrier before any thread performs combination
call void @__kmpc_barrier(...)
// Initialization block (executed by thread 0)
// e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)
call void @__kmpc_critical(...)
// Inside critical section:
// Load the current value from the shared variable
// Load the thread-local private variable's value
// Perform the reduction operation
// Store the result back to the shared variable
call void @__kmpc_end_critical(...)
// Barrier after all threads complete their combinations
call void @__kmpc_barrier(...)
// Broadcast phase:
// Load the final result from the shared variable)
// Store the final result to the original private variable in each thread
// Final barrier after broadcast
call void @__kmpc_barrier(...)
```
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
Summary:
When there are overloaded C++ operators in the global namespace the AST
node for these is not a `BinaryExpr` but a `CXXOperatorCallExpr`. Modify
the uses to handle this case, basically just treating it as a binary
expression with two arguments.
Fixes https://github.com/llvm/llvm-project/issues/141085
We weren't correctly handling size expressions with errors before trying
to get the type of the size expression.
No release note needed because support for 'stripe' was added to the
current release.
Fixes#139433
We were failing to pass a required argument when emitting the
diagnostic, so the source range was being used in place of an index.
This caused a failed assertion due to the incorrect index.
Fixes#139266