The 'cache' construct is lowered as marking the acc.loop in ACC MLIR.
This results in any variable references that are not inside of the
acc.loop being invalid. This patch adds a warning to that effect, and
ensures that the variable references won't be added to the AST during
parsing so we don't try to lower them.
This results in loss of instantiation-diagnostics for these, however
that seems like an acceptable consequence to ignoring it.
OpenACC 3.4 will permit extension clauses, which are clauses that start
with '__', and contain an optional balanced-paren-token-sequence inside
of parens.
This patch ensures we consume these, and emit a warning instead of an
error for unsupported clauses.
I'd misunderstood how the ParseStringLiteralExpression function worked,
so I assumed it would catch non-string literals, however it instead
asserted. This patch now checks for that case and diagnoses.
Fixes: #139346
Brought up in a previous review as a TODO, we could be better about how
we highlight what hte previous clause was, and how to show that the
'device_type' is the one being targetted. This patch rewords the
diagnostics and updates a massive number of tests.
…uses
The Flang implemenation of OpenACC uses a .td file in the llvm/Frontend
directory to determine appertainment in 4 categories:
-Required: If this list has items in it, the directive requires at least
1 of these be present.
-AllowedExclusive: Items on this list are all allowed, but only 1 from
the list may be here (That is, they are exclusive of eachother).
-AllowedOnce: Items on this list are all allowed, but may not be
duplicated.
Allowed: Items on this list are allowed. Note th at the actual list of
'allowed' is all 4 of these lists together.
This is a draft patch to swtich Clang over to use those tables. Surgery
to get this to happen in Clang Sema was somewhat reasonable. However,
some gaps in the implementations are obvious, the existing clang
implementation disagrees with the Flang interpretation of it. SO, we're
keeping a task list here based on what gets discovered.
Changes to Clang:
- [x] Switch 'directive-kind' enum conversions to use tablegen See
ff1a7bddd9435b6ae2890c07eae60bb07898bbf5
- [x] Switch 'clause-kind' enum conversions to use tablegen See
ff1a7bddd9435b6ae2890c07eae60bb07898bbf5
- [x] Investigate 'parse' test differences to see if any new
disagreements arise.
- [x] Clang/Flang disagree as to whether 'collapse' can be multiple
times on a loop. Further research showed no prose to limit this, and the
comment on the clang implementation said "no good reason to allow", so
no standards justification.
- [x] Clang/Flang disagree whether 'num_gangs' can appear >1 on a
compute/combined construct. This ended up being an unjustified
restriction.
- [x] Clang/Flang disagree as to the list of required clauses on a 'set'
construct. My research shows that Clang mistakenly included 'if' in the
list, and that it should be just 'default_async', 'device_num', and
'device_type'.
- [x] Order of 'at least one of' diagnostic has changed. Tests were
updated.
- [x] Ensure we are properly 'de-aliasing' clause names in appertainment
checks?
- [x] What is 'shortloop'? 'shortloop' seems to be an old non-standard
extension that isn't supported by flang, but is parsed for backward
compat reasons. Clang won't parse, but we at least have a spot for it in
the clause list.
- [x] Implemented proposed change for 'routine' gang/worker/vector/seq.
(see issue 539)
- [x] Implement init/shutdown can only have 1 'if' (see issue 540)
- [x] Clang/Flang disagree as to whether 'tile' is permitted more than
once on a 'loop' or combined constructs (Flang prohibits >1). I see no
justification for this in the standard. EDIT: I found a comment in clang
that I did this to make SOMETHING around duplicate checks easier.
Discussion showed we should actually have a better behavior around
'device_type' and duplicates, so I've since implemented that.
- [x] Clang/Flang disagree whether 'gang', 'worker', or 'vector' may
appear on the same construct as a 'seq' on a 'loop' or 'combined'. There
is prose for this in 2022: (a gang, worker, or vector clause may not
appear if a 'seq' clause appears). EDIT: These don't actually disagree,
but aren't in the .td file, so I restored the existing code to do this.
- [x] Clang/Flang disagree on whether 'bind' can appear >1 on a
'routine'. I believe line 3096 (A bind clause may not bind to a routine
name that has a visible bind clause) makes this limitation (Flang
permits >1 bind). we discussed and decided this should have the same
rules as worker/vector/etc, except without the 'exactly 1 of' rule (so
no dupes in individual sections).
- [x] Clang/Flang disagree on whether 'init'/'shutdown' can have
multiple 'device_num' clauses. I believe there is no supporting prose
for this limitation., We decided that `device_num` should only happen
1x.
- [x] Clang/Flang disagree whether 'num_gangs' can appear >1 on a
'kernels' construct. Line 1173 (On a kernels construct, the num_gangs
clause must have a single argument) justifies limiting on a
per-arguement basis, but doesn't do so for multiple num_gangs clauses.
WE decided to do this with the '1-per-device-type' region for num_gangs,
num_workers, and vector_length, see openacc bug here:
https://github.com/OpenACC/openacc-spec/issues/541
Changes to Flang:
- [x] Clang/Flang disgree on whether 'atomic' can take an 'if' clause.
This was added in OpenACC3.3_Next See #135451
- [x] Clang/Flang disagree on whether 'finalize' can be allowed >1 times
on a 'exit_data' construct. see #135415.
- [x] Clang/Flang disagree whether 'if_present' should be allowed >1
times on a 'host_data'/'update' construct. see #135422
- [x] Clang/Flang disagree on whether 'init'/'shutdown' can have
multiple 'device_type' clauses. I believe there is no supporting prose
for this limitation.
- [ ] SEE change for num_gangs/etc above.
Changes that need discussion/research:
Researching in prep of doing the implementation for lowering, I found
that the source of the valid identifiers list from flang is in the
frontend. This patch adds the same list to the frontend, but does it as
a sema diagnostic, so we still parse it as an identifier/identifier-like
thing, but then diagnose it as invalid later.
OpenACC 3.3-NEXT has changed the way tags for copy, copyin, copyout, and
create clauses are specified, and end up adding a few extras, and
permits them as a list. This patch encodes these as bitmask enum so
they can be stored succinctly, but still diagnose reasonably.
OpenACC Github PR#499 defines the pqr-list as having at least 1 item. We
already handle that for all but 'wait', so this patch just does the work
to add it for 'wait', plus adds tests.
This was added in OpenACC PR #511 in the 3.4 branch. From an AST/Sema
perspective this is pretty trivial as the infrastructure for 'if'
already exists, however the atomic construct needed to be taught to take
clauses. This patch does that and adds some testing to do so.
This is the last item of the OpenACC 3.3 spec. It includes the
implicit-name version of 'routine', plus significant refactorings to
make the two work together. The implicit name version is represented as
an attribute on the function call. This patch also implements the
clauses for the implicit-name version, as well as the A.3.4 warning.
The 'bind' clause allows the renaming of a function during code
generation. There are a few rules about when this can/cannot happen,
and it takes either a string or identifier (previously mis-implemetned
as ID-expression) argument.
Note there are additional rules to this in the implicit-function routine
case, but that isn't implemented in this patch, as implicit-function
routine is not yet implemented either.
These 4 clauses are mutually exclusive, AND require at least one of
them. Additionally, gang has some additional restrictions in that only
the 'dim' specifier is permitted. This patch implements all of this, and
ends up refactoring the handling of each of these clauses for
readabililty.
The 'routine' construct has two forms, one which takes the name of a
function that it applies to, and another where it implicitly figures it
out based on the next declaration. This patch implements the former with
the required restrictions on the name and the function-static-variables
as specified.
What has not been implemented is any clauses for this, any of the A.3.4
warnings, or the other form.
This statement level construct takes no clauses and has no associated
statement, and simply labels a number of array elements as valid for
caching. The implementation here is pretty simple, but it is a touch of
a special case for parsing, so the parsing code reflects that.
The 'declare' construct is the first of two 'declaration' level
constructs, so it is legal in any place a declaration is, including as a
statement, which this accomplishes by wrapping it in a DeclStmt. All
clauses on this have a 'same scope' requirement, which this enforces as
declaration context instead, which makes it possible to implement these
as a template.
The 'link' and 'device_resident' clauses are also added, which have some
similar/small restrictions, but are otherwise pretty rote.
This patch implements all of the above.
The atomic construct is a particularly complicated one. The directive
itself is pretty simple, it has 5 options for the 'atomic-clause'.
However, the associated statement is fairly complicated.
'read' accepts:
v = x;
'write' accepts:
x = expr;
'update' (or no clause) accepts:
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
x = expr binop x;
'capture' accepts either a compound statement, or:
v = x++;
v = x--;
v = ++x;
v = --x;
v = x binop= expr;
v = x = x binop expr;
v = x = expr binop x;
IF 'capture' has a compound statement, it accepts:
{v = x; x binop= expr; }
{x binop= expr; v = x; }
{v = x; x = x binop expr; }
{v = x; x = expr binop x; }
{x = x binop expr ;v = x; }
{x = expr binop x; v = x; }
{v = x; x = expr; }
{v = x; x++; }
{v = x; ++x; }
{x++; v = x; }
{++x; v = x; }
{v = x; x--; }
{v = x; --x; }
{x--; v = x; }
{--x; v = x; }
While these are all quite complicated, there is a significant amount
of similarity between the 'capture' and 'update' lists, so this patch
reuses a lot of the same functions.
This patch implements the entirety of 'atomic', creating a new Sema file
for the sema for it, as it is fairly sizable.
This completes the implementation of 'update' by implementing its last
restriction. This restriction requires at least 1 of the 'self', 'host',
or 'device' clauses.
These two clauses just take a 'var-list' and specify where the variables
should be copied from/to. This patch implements the AST nodes for them
and ensures they properly take a var-list.
The 'self' clause is an unfortunately difficult one, as it has a
significantly different meaning between 'update' and the other
constructs. This patch introduces a way for the 'self' clause to work
as both. I considered making this two separate AST nodes (one for
'self' on 'update' and one for the others), however this makes the
automated macros/etc for supporting a clause break.
Instead, 'self' has the ability to act as either a condition or as a
var-list clause. As this is the only one of its kind, it is implemented
all within it. If in the future we have more that work like this, we
should consider rewriting a lot of the macros that we use to make
clauses work, and make them separate ast nodes.
This executable construct has a larger list of clauses than some of the
others, plus has some additional restrictions. This patch implements
the AST node, plus the 'cannot be the body of a if, while, do, switch,
or label' statement restriction. Future patches will handle the
rest of the restrictions, which are based on clauses.
A fairly simple one, only valid on the 'set' construct, this clause
takes an int expression. Most of the work was already done as a part of
parsing, so this patch ends up being a lot of infrastructure.
The 'set' construct is another fairly simple one, it doesn't have an
associated statement and only a handful of allowed clauses. This patch
implements it and all the rules for it, allowing 3 of its for clauses.
The only exception is default_async, which will be implemented in a
future patch, because it isn't just being enabled, it needs a complete
new implementation.
This is a very simple sema implementation, and just required AST node
plus the existing diagnostics. This patch adds tests and adds the AST
node required, plus enables it for 'init' and 'shutdown' (only!)
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented. This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.
The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.
All 4 of the 'data' constructs have a requirement that at least 1 of a
small list of clauses must appear on the construct. This patch
implements that restriction, and updates all of the tests it takes to
do so.
This is a clause that is only valid on 'host_data' constructs, and
identifies variables which it should use the current device address.
From a Sema perspective, the only thing novel here is mild changes to
how ActOnVar works for this clause, else this is very much like the rest
of the 'var-list' clauses.
'delete' is another clause that has very little compile-time
implication, but needs a full AST that takes a var list. This patch
ipmlements it fully, plus adds sufficient test coverage.
This is another new clause specific to 'exit data' that takes a pointer
argument. This patch implements this the same way we do a few other
clauses (like attach) that have the same restrictions.
The 'if_present' clause controls the replacement of addresses in the
var-list in current device memory. This clause can only go on
'host_device'. From a Sema perspective, there isn't anything to do
beyond add this to AST and pass it on.
This is a very simple clause as far as sema is concerned. It is only
valid on 'exit data', and doesn't have any rules involving it, so it is
simply applied and passed onto the MLIR.
This is once again simply enabling this for 'data', 'enter data', and
'exit data' (and ensuring we error for 'host_data'). Implementation is
very simply to enable it rather than emit the not-implemented
diagnostic.
These constructs are all very similar and closely related, so this patch
creates the AST nodes for them, serialization, printing/etc.
Additionally the restrictions are all added as tests/todos in the tests,
as those will have to be implemented once we get those clauses implemented.
This didn't end up being properly tested, but 'delete' as a keyword
causes us to not properly recognize it as a clause kind. This patch
correctly adds the work to make sure it is recognized correctly.
These three are identical to the version on compute constructs, so this
patch implements the tests for it, and ensures that we properly validate
it against all the other clauses we're supposed to. The test is mostly
a mock-up at the moment, since most other clauses aren't implemented
yet for 'loop'.
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.
This patch adds the semantic analysis for the constructs (but no
clauses), as well as the AST nodes.
OpenACC restricts the contents of a 'for' loop affected by a 'loop'
construct without a 'seq'. The loop variable must be integer, pointer,
or random-access-iterator, it must monotonically increase/decrease, and
the trip count must be computable at runtime before the function.
This patch tries to implement some of these limitations to the best of
our ability, though it causes us to be perhaps overly restrictive at the
moment. I expect we'll revisit some of these rules/add additional
supported forms of loop-variable and 'monotonically increasing' here,
but the currently enforced rules are heavily inspired by the OMP
implementation here.
The 'vector' clause specifies the iterations to be executed in vector or
SIMD mode. There are some limitations on which associated compute
contexts may be associated with this and have arguments, but otherwise
this is a fairly unrestricted clause.
It DOES have region limits like 'gang' and 'worker'.
The worker clause specifies iterations of the loop/ that are executed in
parallel by distributing the iterations among the multiple works within
a single gang.
The sema rules for this type are simply that it cannot be combined with
a `kernel` construct with a `num_workers` clause, child `loop` clauses
cannot contain a `gang` or `worker` clause, and that the argument is oly
allowed when associated with a `kernel`.
The 'gang' clause is used to specify parallel execution of loops, thus
has some complicated rules depending on the 'loop's associated compute
construct. This patch implements all of those.
the 'tile' clause requires that it be followed by N (where N is the
number of size expressions) 'tightly nested loops'. This means the
same as it does in 'collapse', so much of the implementation is
simliar/shared with that.
The 'tile' clause shares quite a bit of the rules with 'collapse', so a
followup patch will add those tests/behaviors. This patch deals with
adding the AST node.
The 'tile' clause takes a series of integer constant expressions, or *.
The asterisk is now represented by a new OpenACCAsteriskSizeExpr node,
else this clause is very similar to others.
The 'collapse' clause on a 'loop' construct is used to specify how many
nested loops are associated with the 'loop' construct. It takes an
optional 'force' tag, and an integer constant expression as arguments.
There are many other restrictions based on the contents of the loop/etc,
but those are implemented in followup patches, for now, this patch just
adds the AST node and does basic argument checking on the loop-count.
These three clauses are all quite trivial, as they take no parameters.
They are mutually exclusive, and 'seq' has some other exclusives that
are implemented here.
The ONE thing that isn't implemented is 2.9's restriction (line 2010):
'A loop associated with a 'loop' construct that does not have a 'seq'
clause must be written to meet all the following conditions'.
Future clauses will require similar work, so it'll be done as a
followup.
This patch implements the 'loop' construct AST, as well as the basic
appertainment rule. Additionally, it sets up the 'parent' compute
construct, which is necessary for codegen/other diagnostics.
A 'loop' can apply to a for or range-for loop, otherwise it has no other
restrictions (though some of its clauses do).