This introduces a new class 'UnsignedOrNone', which models a lite
version of `std::optional<unsigned>`, but has the same size as
'unsigned'.
This replaces most uses of `std::optional<unsigned>`, and similar
schemes utilizing 'int' and '-1' as sentinel.
Besides the smaller size advantage, this is simpler to serialize, as its
internal representation is a single unsigned int as well.
This was added in OpenACC PR #511 in the 3.4 branch. From an AST/Sema
perspective this is pretty trivial as the infrastructure for 'if'
already exists, however the atomic construct needed to be taught to take
clauses. This patch does that and adds some testing to do so.
This is the last item of the OpenACC 3.3 spec. It includes the
implicit-name version of 'routine', plus significant refactorings to
make the two work together. The implicit name version is represented as
an attribute on the function call. This patch also implements the
clauses for the implicit-name version, as well as the A.3.4 warning.
The 'bind' clause allows the renaming of a function during code
generation. There are a few rules about when this can/cannot happen,
and it takes either a string or identifier (previously mis-implemetned
as ID-expression) argument.
Note there are additional rules to this in the implicit-function routine
case, but that isn't implemented in this patch, as implicit-function
routine is not yet implemented either.
These 4 clauses are mutually exclusive, AND require at least one of
them. Additionally, gang has some additional restrictions in that only
the 'dim' specifier is permitted. This patch implements all of this, and
ends up refactoring the handling of each of these clauses for
readabililty.
The 'routine' construct has two forms, one which takes the name of a
function that it applies to, and another where it implicitly figures it
out based on the next declaration. This patch implements the former with
the required restrictions on the name and the function-static-variables
as specified.
What has not been implemented is any clauses for this, any of the A.3.4
warnings, or the other form.
This statement level construct takes no clauses and has no associated
statement, and simply labels a number of array elements as valid for
caching. The implementation here is pretty simple, but it is a touch of
a special case for parsing, so the parsing code reflects that.
The 'declare' construct is the first of two 'declaration' level
constructs, so it is legal in any place a declaration is, including as a
statement, which this accomplishes by wrapping it in a DeclStmt. All
clauses on this have a 'same scope' requirement, which this enforces as
declaration context instead, which makes it possible to implement these
as a template.
The 'link' and 'device_resident' clauses are also added, which have some
similar/small restrictions, but are otherwise pretty rote.
This patch implements all of the above.
The atomic construct is a particularly complicated one. The directive
itself is pretty simple, it has 5 options for the 'atomic-clause'.
However, the associated statement is fairly complicated.
'read' accepts:
v = x;
'write' accepts:
x = expr;
'update' (or no clause) accepts:
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
x = expr binop x;
'capture' accepts either a compound statement, or:
v = x++;
v = x--;
v = ++x;
v = --x;
v = x binop= expr;
v = x = x binop expr;
v = x = expr binop x;
IF 'capture' has a compound statement, it accepts:
{v = x; x binop= expr; }
{x binop= expr; v = x; }
{v = x; x = x binop expr; }
{v = x; x = expr binop x; }
{x = x binop expr ;v = x; }
{x = expr binop x; v = x; }
{v = x; x = expr; }
{v = x; x++; }
{v = x; ++x; }
{x++; v = x; }
{++x; v = x; }
{v = x; x--; }
{v = x; --x; }
{x--; v = x; }
{--x; v = x; }
While these are all quite complicated, there is a significant amount
of similarity between the 'capture' and 'update' lists, so this patch
reuses a lot of the same functions.
This patch implements the entirety of 'atomic', creating a new Sema file
for the sema for it, as it is fairly sizable.
This completes the implementation of 'update' by implementing its last
restriction. This restriction requires at least 1 of the 'self', 'host',
or 'device' clauses.
These two clauses just take a 'var-list' and specify where the variables
should be copied from/to. This patch implements the AST nodes for them
and ensures they properly take a var-list.
The 'self' clause is an unfortunately difficult one, as it has a
significantly different meaning between 'update' and the other
constructs. This patch introduces a way for the 'self' clause to work
as both. I considered making this two separate AST nodes (one for
'self' on 'update' and one for the others), however this makes the
automated macros/etc for supporting a clause break.
Instead, 'self' has the ability to act as either a condition or as a
var-list clause. As this is the only one of its kind, it is implemented
all within it. If in the future we have more that work like this, we
should consider rewriting a lot of the macros that we use to make
clauses work, and make them separate ast nodes.
This executable construct has a larger list of clauses than some of the
others, plus has some additional restrictions. This patch implements
the AST node, plus the 'cannot be the body of a if, while, do, switch,
or label' statement restriction. Future patches will handle the
rest of the restrictions, which are based on clauses.
A fairly simple one, only valid on the 'set' construct, this clause
takes an int expression. Most of the work was already done as a part of
parsing, so this patch ends up being a lot of infrastructure.
The 'set' construct is another fairly simple one, it doesn't have an
associated statement and only a handful of allowed clauses. This patch
implements it and all the rules for it, allowing 3 of its for clauses.
The only exception is default_async, which will be implemented in a
future patch, because it isn't just being enabled, it needs a complete
new implementation.
This is a very simple sema implementation, and just required AST node
plus the existing diagnostics. This patch adds tests and adds the AST
node required, plus enables it for 'init' and 'shutdown' (only!)
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented. This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.
The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.
All 4 of the 'data' constructs have a requirement that at least 1 of a
small list of clauses must appear on the construct. This patch
implements that restriction, and updates all of the tests it takes to
do so.
This is a clause that is only valid on 'host_data' constructs, and
identifies variables which it should use the current device address.
From a Sema perspective, the only thing novel here is mild changes to
how ActOnVar works for this clause, else this is very much like the rest
of the 'var-list' clauses.
'delete' is another clause that has very little compile-time
implication, but needs a full AST that takes a var list. This patch
ipmlements it fully, plus adds sufficient test coverage.
This is another new clause specific to 'exit data' that takes a pointer
argument. This patch implements this the same way we do a few other
clauses (like attach) that have the same restrictions.
The 'if_present' clause controls the replacement of addresses in the
var-list in current device memory. This clause can only go on
'host_device'. From a Sema perspective, there isn't anything to do
beyond add this to AST and pass it on.
This is a very simple clause as far as sema is concerned. It is only
valid on 'exit data', and doesn't have any rules involving it, so it is
simply applied and passed onto the MLIR.
'copy' is another that is identical in behavior on 'data' as far as
semantic analysis is concerned as the compute constructs, so this patch
adds tests and enables 'copy'.
This is once again simply enabling this for 'data', 'enter data', and
'exit data' (and ensuring we error for 'host_data'). Implementation is
very simply to enable it rather than emit the not-implemented
diagnostic.
Semantically this is identical to all other constructs with this tag,
except in this case the 'wait' and 'async' are the only ones allowed
after it. This patch implements that rule using the existing
infrastructure.
These constructs are all very similar and closely related, so this patch
creates the AST nodes for them, serialization, printing/etc.
Additionally the restrictions are all added as tests/todos in the tests,
as those will have to be implemented once we get those clauses implemented.
Once again, this is a clause on a combined construct that does almost
exactly what the loop/compute construct version does, only with some sl
ightly different evaluation rules/sema rules as it doesn't have to
consider the parent, just the 'combined' construct. The two sets of
rules for reduction on loop and compute are fine together, so this
ensures they are all enforced for this too.
The 'gangs' 'num_gangs' 'reduction' diagnostic (Dim>1) had to be applied
to num_gangs as well, as it previously wasn't permissible to get in this
situation, but we now can.
Similar to 'worker', the 'vector' clause has some rules that needed to
be applied on its argument legality that for combined constructs need to
look at the current construct, not the 'effective' parent construct.
Additionally, it has some interaction with `vector_length` that needed
to be encoded as well. This patch implements it.
This is very similar to 'gang', except with fewer restrictions, and only an
interaction with 'num_workers', plus disallowing 'gang' and 'worker' in
its associated statement. This patch implements this, the same as how
'gang' implemented it.
This one is a bit complicated, as it has some interesting interactions,
as 'gang' Sema is required to look at its containing compute construct.
Except in the case of a combined construct, they are the same. This
resulted in a large refactor of the checking code for CheckGangExpr,
plus some additional work on the diagnostics for its interaction with
'num_gangs' and 'vector'/'worker'.
The original implementation rejected some valid constructs. The rule is
supposed to be:
Gang-on-Kernel cannot have a gang in its region
Worker cannot have a worker or gang in its region
Vector cannot have worker, gang, or vector in its region.
The previous implementation improperly implemented that vector wasnt'
allowed in the other two. This patch fixes it and adds testing for it.
'num_gangs', 'num_workers', and 'vector_length' are similar to
eachother, and are all the same implementation as for compute
constructs, so this patch just enables them and adds the necessary
testing to ensure they work correctly. These will get more complicated
when they get combined with 'gang', 'worker', 'vector' and 'reduction',
but those restrictions will be implemented when those clauses are
enabled.
Like collapse, this clause has only mild changes that need to be made.
Most of the associated work for the RAII container was already done
previously, so this is mostly just updating the diagnostics to properly
emit the right construct.