Fixes#153043.
This is another case of debug location not getting updated when the
insert point is changed by the `restoreIP`. Fixed by using the wrapper
function that updates the debug location.
This patch handles the strided update in the `#pragma omp target update
from(data[a🅱️c])` directive where 'c' represents the strided access
leading to non-contiguous update in the `data` array when the offloaded
execution returns the control back to host from device using the `from`
clause.
Issue: Clang CodeGen where info is generated for the particular
`MapType` (to, from, etc), it was failing to detect the strided access.
Because of this, the `MapType` bits were incorrect when passed to
runtime. This led to incorrect execution (contiguous) in the
libomptarget runtime code.
Added a minimal testcase that verifies the working of the patch.
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
We can derive and upgrade alignment for loads/stores using other
well-aligned loads/stores. This optimization does a single forward pass through
each basic block and uses loads/stores (the alignment and the offset) to
derive the best possible alignment for a base pointer, caching the
result. If it encounters another load/store based on that pointer, it
tries to upgrade the alignment. The optimization must be a forward pass within a basic
block because control flow and exception throwing can impact alignment guarantees.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
In OpenMP 6.0 specification, section 5.2.5 Array Sections, page 166,
lines 28-28:
When the length is absent and the size of the dimension is not known,
the array section is an assumed-size array.
Testing
- Updated LIT test
- check-all
- OpenMP_VV (formerly sollve) test case
tests/6.0/target/test_target_assumed_array_size.c
If the need_device_addr adjust-op modifier is present, each list item
that appears in the clause must refer to an argument in the declaration
of the function variant that has a reference type.
Reference:
OpenMP 6.0 [Sec 9.6.2, page 332, line 31-33, adjust_args clause,
Restrictions]
Let Clang emit `dead_on_return` attribute on pointer arguments
that are passed indirectly, namely, large aggregates that the
ABI mandates be passed by value; thus, the parameter is destroyed
within the callee. Writes to such arguments are not observable by
the caller after the callee returns.
This should desirably enable further MemCpyOpt/DSE optimizations.
Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
Implement parsing and semantic analysis support for the message and
severity clauses that have been added to the parallel directive in
OpenMP 6.0, 12.1.
According to OpenMP 5.2 (Section 7.8.2), the directive name `declare
target` may be used as a synonym for `begin declare target` only when no
clauses are specified. This clause-less delimited form is now deprecated
and should emit a deprecation warning.
```
// Deprecated usage (should trigger warning):
#pragma omp declare target // deprecated in OpenMP 5.2
void foo1() {
}
#pragma omp end declare target
// Valid usage with clause (should not trigger warning):
#pragma omp declare target enter(foo2)
void foo2() {
}
```
```
// Recommended replacement for deprecated delimited form:
#pragma omp begin declare target
void foo() {
}
#pragma omp end declare target
```
---------
Co-authored-by: urvi-rav <urvi.rav@hpe.com>
The output of the compile-and-run tests is incorrect. These will be used
for reference in future commits that resolve the issues.
Also updated the existing clang LIT test,
target_map_both_pointer_pointee_codegen.cpp, with more constructs and
fewer CHECKs (through more update_cc_test_checks filters).
OpenMP 6.0 introduced alternative spelling for some directives, with the
previous spellings still allowed.
Warn the user when a new spelling is encountered with OpenMP version set
to an older value.
Implement parsing and semantic analysis support for the optional
'strict' modifier of the num_threads clause. This modifier has been
introduced in OpenMP 6.0, section 12.1.2.
Note: this is basically 1:1 https://reviews.llvm.org/D138328.
Fix a crash that happened during parsing of a "declare mapper" construct
for a struct that contains an element for which we also declared a
custom default mapper.
This builds upon #101101 from @jyu2-git, which used compiler-generated
mappers when mapping an array-section of structs with members that have
user-defined default mappers.
Now we do the same when mapping arrays of structs.
Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.
Sample Test Case from OpenMP 6.0 Example
```
#include <assert.h>
#include <omp.h>
#define N 10
void do_red(int n, int *v, int &sum_v)
{
sum_v = 0; // sum_v is private
#pragma omp for reduction(original(private),+: sum_v)
for (int i = 0; i < n; i++)
{
sum_v += v[i];
}
}
int main(void)
{
int v[N];
for (int i = 0; i < N; i++)
v[i] = i;
#pragma omp parallel num_threads(4)
{
int s_v; // s_v is private
do_red(N, v, s_v);
assert(s_v == 45);
}
return 0;
}
```
Expected Codegen:
```
// A shared global/static variable is introduced for the reduction result.
// This variable is initialized (e.g., using memset or a UDR initializer)
// e.g., .omp.reduction.internal_private_var
// Barrier before any thread performs combination
call void @__kmpc_barrier(...)
// Initialization block (executed by thread 0)
// e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)
call void @__kmpc_critical(...)
// Inside critical section:
// Load the current value from the shared variable
// Load the thread-local private variable's value
// Perform the reduction operation
// Store the result back to the shared variable
call void @__kmpc_end_critical(...)
// Barrier after all threads complete their combinations
call void @__kmpc_barrier(...)
// Broadcast phase:
// Load the final result from the shared variable)
// Store the final result to the original private variable in each thread
// Final barrier after broadcast
call void @__kmpc_barrier(...)
```
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
Summary:
When there are overloaded C++ operators in the global namespace the AST
node for these is not a `BinaryExpr` but a `CXXOperatorCallExpr`. Modify
the uses to handle this case, basically just treating it as a binary
expression with two arguments.
Fixes https://github.com/llvm/llvm-project/issues/141085
The PR139793 added handling of the Fortran-only "workshare" directive,
however there are more such directives, e.g. "allocators". Use the
genDirectiveLanguages function to detect non-C/C++ directives instead of
enumerating them.
There were two crashes that have the same root cause: not correctly
handling unexpected tokens. In one case, we were failing to return early
which caused us to parse a paren as a regular token instead of a special
token, causing an assertion. The other case was failing to commit or
revert the tentative parse action when not getting a paren when one was
expected.
Fixes#139665
The "workshare" construct is only present in Fortran. The common OpenMP
code does treat it as any other directive, but in clang we need to
reject it, and do so gracefully before it encounters an internal
assertion.
Fixes https://github.com/llvm/llvm-project/issues/139424
We weren't correctly handling size expressions with errors before trying
to get the type of the size expression.
No release note needed because support for 'stripe' was added to the
current release.
Fixes#139433
If the next token after 'cancel' is a special token, we would trigger an
assertion. We should be consuming any token, same as elsewhere in the
function.
Note, we could check for an unknown directive and do different error
recovery, but that caused too many behavioral changes for other tests in
the form of "unexpected tokens ignored" diagnostics that didn't seem
like an improvement for the test cases.
Fixes#139360
We are missing a few declerative directives in the parser for executable
and declerative directives causing us to error out if they are inside of
functions. This adds support for begin/end declare variant by reusing
the logic we used in global scope.
The device and implementation set should trigger elision of tokens if
they do not match statically in a begin/end declare variant. This simply
extends the logic from the device set only and includes the
implementation set.
Reported by @kkwli.
We were failing to pass a required argument when emitting the
diagnostic, so the source range was being used in place of an index.
This caused a failed assertion due to the incorrect index.
Fixes#139266
We were trying to get type information out of an expression node which
contained errors. That causes the type of the expression to be
dependent, which the code was not expecting. Now we handle error
conditions with an early return.
Fixes#139073
This patch enhances Clang's diagnosis of an unknown attribute by
printing the attribute's namespace in the diagnostic text. e.g.,
```cpp
[[foo::nodiscard]] int f(); // warning: unknown attribute 'foo::nodiscard' ignored
```
This adds a new diagnostic group, -Wc++-keyword, which is off by default
and grouped under -Wc++-compat. The diagnostic catches use of C++
keywords in C code.
This change additionally fixes an issue with -Wreserved-identifier not
diagnosing use of reserved identifiers in function parameter lists in a
function declaration which is not a definition.
Fixes https://github.com/llvm/llvm-project/issues/21898
This PR replaces the `default` clause with the `otherwise` clause for
the `metadirective` in OpenMP. The `otherwise` clause serves as a
fallback condition when no directive from the when clauses is selected.
In the `when` clause, context selectors define traits evaluated to
determine the directive to be applied.
The previous merge was reverted due to a failing test case, which has
now been resolved.
---------
Co-authored-by: urvi-rav <urvi.rav@hpe.com>
When we get a function back from `CodeExtractor`, we discard its entry
block after coping its instructions into the entry block we prepared.
While copying the instructions, the terminator is discarded for obvious
reasons. But if there were some debug values attached to the terminator,
those are useful and needs to be copied.
OpenMP has restrictions on directives allowed to be strictly nested
inside a
construct with the order(concurrent) clause specified.
- OpenMP 5.0, 5.1, and 5.2 allows: 'loop', 'parallel', 'simd', and
combined directives starting with 'parallel'.
- OpenMP 6.0 allows: the above directives plus 'atomic' and
all loop-transformation directives.
Furthermore, a region that corresponds to a construct with
order(concurrent)
specified may not contain calls to the OpenMP runtime API.
This PR fixes the following issues in the current implementation:
With -fopenmp-version=50: none of the nesting restrictions above were
enforced
With -fopenmp-version=60:
1. Clang did not reject OpenMP runtime APIs encountered in the region.
2. Clang erroneously rejected combined directives starting with
parallel.
---------
Co-authored-by: Zahira Ammarguellat <zahira.ammarguellat@intel.com>
Just to get some more coverage.
Some of the behavior might be weird and change in the future, but let's
lock down what happens today to at least prevent regressions.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.
This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).
So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
Initial Parse/Sema support for reduction over private variable with
reduction clause.
Section 7.6.10 in in OpenMP 6.0 spec.
- list item in a reduction clause can now be private in the enclosing
context.
- Added support for _original-sharing-modifier_ with reduction clause.
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
Previously an extra block was created by splitting the previous exit
block. This produced incorrect results when the outlined region
statically never terminated because then there wouldn't be a valid exit
block for the outlined region, this caused this newly added block to
have an incoming edge from outside of the outlining region, which caused
outlining to fail.
So far as I can tell this extra block no longer serves any purpose. The
comment says it is supposed to collate multiple control flow edges into
one place, but the code as it is now does not achieve this. In fact, as
can be seen from the changes to lit tests, this block was not actually
outlined in the end. This is because there are actually two code
extractors: one in the callback for creating a parallel op which is used
to find what the input/output variables are (which does have this block
added to it), and another one which actually does the outlining (which
this block was not added to).
Tested with the gfortran and fujitsu test suites.
Fixes#112884
Compiling this:
`int main() {`
` #pragma omp metadirective when(use r= {condition(0)}`
`: parallel for)`
`for (int i=0; i<10; i++)`
;
}`
is generating an error:
`error: expected expression`
The compiler is interpreting this as if it's compiling a `#pragma omp
metadirective` with no `otherwise` clause.
In the OMP5.2 specs chapter 7.4 it's mentioned that:
`If no otherwise clause is specified the effect is as if one was
specified without an associated directive variant.`
This patch fixes the issue.