The standard is ambiguous, but we can only support
arrays/array-sections/etc of the composite type, so make sure we enforce
the rule that way. This will better support how we need to do lowering.
Like a big dummy, I completely skipped running this test locally and
forgot it would need check lines. *sigh*, Looks like SOMEONE has a case
of the Mondays!
Anyway, this patch fixes it by adding the proper verify lines.
As reported, OpenACC's variable declaration handling was assuming some
semblence of legality in the example, so it didn't properly handle an
error case. This patch fixes its assumptions so that we don't crash.
Fixes#154008
The OpenACC standard is going to change to clarify that init, shutdown,
and set should only have a single architecture in each 'device_type'
clause. This patch implements that restriction.
See: https://github.com/OpenACC/openacc-spec/pull/550
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
private/firstprivate typically do copy operations, however copying a VLA
isn't really possible. This patch introduces a warning to alert the
person that this copy isn't happening correctly.
As a future direction, we MIGHT consider doing additional work to make
sure they are initialized/copied/deleted/etc correctly.
Under C++17, `DeletedCopy` and `DefaultedCopy` in the test
`clang/test/SemaOpenACC/private_firstprivate_reduction_required_ops.cpp`
are eligible for aggregate initialization. Under C++20 and up, that is
no longer the case. Therefore, an explicit constructor must be declared
to avoid additional diagnostics which the test does not expect.
This test was failing in our downstream testing where our toolchain uses
C++20 as the default dialect.
See this reduced example for more details:
https://godbolt.org/z/46oErYT43
These would not give a correct initializer, but they are not possible
to generate correctly anyway, so this patch makes sure we look through
the array type to correctly diagnose these.
'firstprivate' can't be generated unless we have a copy constructor, so
this patch implements the restriction as a warning, and prevents the
item from being added to the AST.
These two both allow arrays as their variable references, but it is a
common thing to use sub-arrays as a way to get a pointer to act as an
array with other compilers. This patch adds these, with an
extension-warning.
Running an external test suite (UDel) showed that our expression
comparison for the 'cache' rule checking was overly strict in the
presence of irrelevant parens/casts/etc. This patch ensures we skip
them when checking.
This also changes the diagnostic to say 'sub-expression' instead of
variable, which is more correct.
A 'private' variable reference needs to have a default constructor and a
destructor, else we cannot properly emit them in codegen. This patch
adds a warning-as-default-error to diagnose this.
We'll have to do something similar for firstprivate/reduction, however
it isn't clear whether we could skip the check for default-constructor
for those two (they still need a destructor!). Depending on how we
intend to create them (and we probably have to figure this out?), we
could either require JUST a copy-constructor (then make the init section
just the alloca, and the copy-ctor be the 'copy' section), OR they
require a default-constructor + copy-assignment.
The 'cache' construct is lowered as marking the acc.loop in ACC MLIR.
This results in any variable references that are not inside of the
acc.loop being invalid. This patch adds a warning to that effect, and
ensures that the variable references won't be added to the AST during
parsing so we don't try to lower them.
This results in loss of instantiation-diagnostics for these, however
that seems like an acceptable consequence to ignoring it.
While doing lowering, I discovered that the restriction onthe allowed
modifiers for 'copyout' didn't make sense! After discussion on the
OpenACC standards mailing list I discovered that this was a copy/paste
error during standardization that they intend to fix, and really meant
for copyout to allow alwaysout instead of alwaysin.
When implementing, I blindly followed the standard :)
This patch corrects the implementation to do what was meant.
The 'capture' modifier is an OpenACC 3.3NEXT (AKA 3.4) feature, which
permits a new kind of identifying the memory location of variables in a
data clause. However, it is only valid on data, combined, or compute
constructs.
This patch implements all of the proper restrictions.
The code to analyze VarDecls for the purpose of ensuring a magic-static
isn't present in a 'routine' was getting confused/crashed because we
create something that looks like a magic-static during error-recovery,
but it is still an invalid decl.
This patch causes us to just 'give up' in the case where the vardecl is
already invalid.
Fixes: #140920
In a sub-subscript of an array-section, it is actually an array section.
So make sure we get the location correct when there isn't a 'colon' to
look at.
For some reason when implementing 'vector' I didn't include switch
entries for the combined constructs. I audited the rest of the uses of
this pattern, and got it right everywhere else, so I'm not sure why I
missed it here.
Fixes: #140339
When implementing the parsing for OpenACC I ensured that we could always
continue to do diagnostics/etc. For the most part, I was consistent
with that assumption throughout clause Sema, but in 3 cases I left in
some unreachables for cases where this would happen. I've now properly
handled all 3 in a reasonable way.
Fixes: #139894
Brought up in a previous review as a TODO, we could be better about how
we highlight what hte previous clause was, and how to show that the
'device_type' is the one being targetted. This patch rewords the
diagnostics and updates a massive number of tests.
Previously we just checked duplicates for a handful of clauses between
active device_type clauses. However, after looking into the
implementation for 'collapse', I discovered that duplicate device_type
identifiers with duplicate versions of htese clause are problematic.
This patch corrects this and now catches these conflicts.
The 'loop' construct has some limitations that are not particularly
clear on what can be in the for-loop. This patch adds some restriction
so that the increment can only be a addition or subtraction operation,
BUT starts allowing Itr = Itr +/- N forms.
The 'loop' construct has some pretty strict checks as to what the for
loop associated with it has to look like. The previous implementation
was a little convoluted and missed implementing the 'condition'
checking. Additionally, it did a bad job double-diagnosing with
templates.
This patch rewrites it in a way that does a much better job with the
double-diagnosing, and proeprly implements some checks for the
'condition' of the for loop.
Async acts just like num_workers/vector_length in that it gets a new
variant per device_type and is lowered as an operand.
However, it has one additional complication, in that it can have a
variant that has no argument, which produces an attribute with the
correct devicetype.
Additionally, this syncronizes us with the implementation of flang,
which prohibits multiple 'async' clauses per-device_type.
…uses
The Flang implemenation of OpenACC uses a .td file in the llvm/Frontend
directory to determine appertainment in 4 categories:
-Required: If this list has items in it, the directive requires at least
1 of these be present.
-AllowedExclusive: Items on this list are all allowed, but only 1 from
the list may be here (That is, they are exclusive of eachother).
-AllowedOnce: Items on this list are all allowed, but may not be
duplicated.
Allowed: Items on this list are allowed. Note th at the actual list of
'allowed' is all 4 of these lists together.
This is a draft patch to swtich Clang over to use those tables. Surgery
to get this to happen in Clang Sema was somewhat reasonable. However,
some gaps in the implementations are obvious, the existing clang
implementation disagrees with the Flang interpretation of it. SO, we're
keeping a task list here based on what gets discovered.
Changes to Clang:
- [x] Switch 'directive-kind' enum conversions to use tablegen See
ff1a7bddd9435b6ae2890c07eae60bb07898bbf5
- [x] Switch 'clause-kind' enum conversions to use tablegen See
ff1a7bddd9435b6ae2890c07eae60bb07898bbf5
- [x] Investigate 'parse' test differences to see if any new
disagreements arise.
- [x] Clang/Flang disagree as to whether 'collapse' can be multiple
times on a loop. Further research showed no prose to limit this, and the
comment on the clang implementation said "no good reason to allow", so
no standards justification.
- [x] Clang/Flang disagree whether 'num_gangs' can appear >1 on a
compute/combined construct. This ended up being an unjustified
restriction.
- [x] Clang/Flang disagree as to the list of required clauses on a 'set'
construct. My research shows that Clang mistakenly included 'if' in the
list, and that it should be just 'default_async', 'device_num', and
'device_type'.
- [x] Order of 'at least one of' diagnostic has changed. Tests were
updated.
- [x] Ensure we are properly 'de-aliasing' clause names in appertainment
checks?
- [x] What is 'shortloop'? 'shortloop' seems to be an old non-standard
extension that isn't supported by flang, but is parsed for backward
compat reasons. Clang won't parse, but we at least have a spot for it in
the clause list.
- [x] Implemented proposed change for 'routine' gang/worker/vector/seq.
(see issue 539)
- [x] Implement init/shutdown can only have 1 'if' (see issue 540)
- [x] Clang/Flang disagree as to whether 'tile' is permitted more than
once on a 'loop' or combined constructs (Flang prohibits >1). I see no
justification for this in the standard. EDIT: I found a comment in clang
that I did this to make SOMETHING around duplicate checks easier.
Discussion showed we should actually have a better behavior around
'device_type' and duplicates, so I've since implemented that.
- [x] Clang/Flang disagree whether 'gang', 'worker', or 'vector' may
appear on the same construct as a 'seq' on a 'loop' or 'combined'. There
is prose for this in 2022: (a gang, worker, or vector clause may not
appear if a 'seq' clause appears). EDIT: These don't actually disagree,
but aren't in the .td file, so I restored the existing code to do this.
- [x] Clang/Flang disagree on whether 'bind' can appear >1 on a
'routine'. I believe line 3096 (A bind clause may not bind to a routine
name that has a visible bind clause) makes this limitation (Flang
permits >1 bind). we discussed and decided this should have the same
rules as worker/vector/etc, except without the 'exactly 1 of' rule (so
no dupes in individual sections).
- [x] Clang/Flang disagree on whether 'init'/'shutdown' can have
multiple 'device_num' clauses. I believe there is no supporting prose
for this limitation., We decided that `device_num` should only happen
1x.
- [x] Clang/Flang disagree whether 'num_gangs' can appear >1 on a
'kernels' construct. Line 1173 (On a kernels construct, the num_gangs
clause must have a single argument) justifies limiting on a
per-arguement basis, but doesn't do so for multiple num_gangs clauses.
WE decided to do this with the '1-per-device-type' region for num_gangs,
num_workers, and vector_length, see openacc bug here:
https://github.com/OpenACC/openacc-spec/issues/541
Changes to Flang:
- [x] Clang/Flang disgree on whether 'atomic' can take an 'if' clause.
This was added in OpenACC3.3_Next See #135451
- [x] Clang/Flang disagree on whether 'finalize' can be allowed >1 times
on a 'exit_data' construct. see #135415.
- [x] Clang/Flang disagree whether 'if_present' should be allowed >1
times on a 'host_data'/'update' construct. see #135422
- [x] Clang/Flang disagree on whether 'init'/'shutdown' can have
multiple 'device_type' clauses. I believe there is no supporting prose
for this limitation.
- [ ] SEE change for num_gangs/etc above.
Changes that need discussion/research:
Discussions with the OpenACC Standard folks and the dialect folks showed
that the ability to have 'set' have a 'device_type' with more than one
architecture was a mistake, and one that will be fixed in future
revisions of the standard. Since the dialect requires this anyway,
we'll implement this in advance of standardization.
Researching in prep of doing the implementation for lowering, I found
that the source of the valid identifiers list from flang is in the
frontend. This patch adds the same list to the frontend, but does it as
a sema diagnostic, so we still parse it as an identifier/identifier-like
thing, but then diagnose it as invalid later.
OpenACC 3.3-NEXT has changed the way tags for copy, copyin, copyout, and
create clauses are specified, and end up adding a few extras, and
permits them as a list. This patch encodes these as bitmask enum so
they can be stored succinctly, but still diagnose reasonably.
OpenACC Github PR#499 defines the pqr-list as having at least 1 item. We
already handle that for all but 'wait', so this patch just does the work
to add it for 'wait', plus adds tests.
This was added in OpenACC PR #511 in the 3.4 branch. From an AST/Sema
perspective this is pretty trivial as the infrastructure for 'if'
already exists, however the atomic construct needed to be taught to take
clauses. This patch does that and adds some testing to do so.
Original PR: #130537
Originally reverted due to revert of dependent commit. Relanding with no
changes.
This changes the MemberPointerType representation to use a
NestedNameSpecifier instead of a Type to represent the base class.
Since the qualifiers are always parsed as nested names, there was an
impedance mismatch when converting these back and forth into types, and
this led to issues in preserving sugar.
The nested names are indeed a better match for these, as the differences
which a QualType can represent cannot be expressed syntatically, and
they represent the use case more exactly, being either dependent or
referring to a CXXRecord, unqualified.
This patch also makes the MemberPointerType able to represent sugar for
a {up/downcast}cast conversion of the base class, although for now the
underlying type is canonical, as preserving the sugar up to that point
requires further work.
As usual, includes a few drive-by fixes in order to make use of the
improvements.
This is the last item of the OpenACC 3.3 spec. It includes the
implicit-name version of 'routine', plus significant refactorings to
make the two work together. The implicit name version is represented as
an attribute on the function call. This patch also implements the
clauses for the implicit-name version, as well as the A.3.4 warning.
Original PR: #130537
Reland after updating lldb too.
This changes the MemberPointerType representation to use a
NestedNameSpecifier instead of a Type to represent the base class.
Since the qualifiers are always parsed as nested names, there was an
impedance mismatch when converting these back and forth into types, and
this led to issues in preserving sugar.
The nested names are indeed a better match for these, as the differences
which a QualType can represent cannot be expressed syntatically, and
they represent the use case more exactly, being either dependent or
referring to a CXXRecord, unqualified.
This patch also makes the MemberPointerType able to represent sugar for
a {up/downcast}cast conversion of the base class, although for now the
underlying type is canonical, as preserving the sugar up to that point
requires further work.
As usual, includes a few drive-by fixes in order to make use of the
improvements.
This changes the MemberPointerType representation to use a
NestedNameSpecifier instead of a Type to represent the class.
Since the qualifiers are always parsed as nested names, there was an
impedance mismatch when converting these back and forth into types, and
this led to issues in preserving sugar.
The nested names are indeed a better match for these, as the differences
which a QualType can represent cannot be expressed syntactically, and it
also represents the use case more exactly, being either dependent or
referring to a CXXRecord, unqualified.
This patch also makes the MemberPointerType able to represent sugar for
a {up/downcast}cast conversion of the base class, although for now the
underlying type is canonical, as preserving the sugar up to that point
requires further work.
As usual, includes a few drive-by fixes in order to make use of the
improvements, and removing some duplications, for example
CheckBaseClassAccess is deduplicated from across SemaAccess and
SemaCast.
OpenACC PR 475(targetting OpenACC3.4) added support for the _Pragma
spelling of an OpenACC pragma. We already implemented this, as clang
doesn't really differentiate between the spellings (so we did it as an
inadvertent extension).
This patch adds a few spot-check tests to make sure we support this
spelling and that it results in the same AST as the traditional
spelling.
The 'bind' clause allows the renaming of a function during code
generation. There are a few rules about when this can/cannot happen,
and it takes either a string or identifier (previously mis-implemetned
as ID-expression) argument.
Note there are additional rules to this in the implicit-function routine
case, but that isn't implemented in this patch, as implicit-function
routine is not yet implemented either.
There is a slightly different list for routine on which clauses are
permitted after it (like the rest of the constructs), but this
implements and tests them to make sure we get them right.
'nohost' is only valid on routine, and states that the compiler
shouldn't compile this routine for the host. It has no arguments, so no
checking is required besides putting it in the AST.
These 4 clauses are mutually exclusive, AND require at least one of
them. Additionally, gang has some additional restrictions in that only
the 'dim' specifier is permitted. This patch implements all of this, and
ends up refactoring the handling of each of these clauses for
readabililty.
We previously allowed duplicates of auto/seq/independent on a 'loop'
construct. This is disallowed by the restriction (which says exactly one
of...), so this patch ensures they are disallowed.
The 'routine' construct has two forms, one which takes the name of a
function that it applies to, and another where it implicitly figures it
out based on the next declaration. This patch implements the former with
the required restrictions on the name and the function-static-variables
as specified.
What has not been implemented is any clauses for this, any of the A.3.4
warnings, or the other form.