The atomic construct is a particularly complicated one. The directive
itself is pretty simple, it has 5 options for the 'atomic-clause'.
However, the associated statement is fairly complicated.
'read' accepts:
v = x;
'write' accepts:
x = expr;
'update' (or no clause) accepts:
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
x = expr binop x;
'capture' accepts either a compound statement, or:
v = x++;
v = x--;
v = ++x;
v = --x;
v = x binop= expr;
v = x = x binop expr;
v = x = expr binop x;
IF 'capture' has a compound statement, it accepts:
{v = x; x binop= expr; }
{x binop= expr; v = x; }
{v = x; x = x binop expr; }
{v = x; x = expr binop x; }
{x = x binop expr ;v = x; }
{x = expr binop x; v = x; }
{v = x; x = expr; }
{v = x; x++; }
{v = x; ++x; }
{x++; v = x; }
{++x; v = x; }
{v = x; x--; }
{v = x; --x; }
{x--; v = x; }
{--x; v = x; }
While these are all quite complicated, there is a significant amount
of similarity between the 'capture' and 'update' lists, so this patch
reuses a lot of the same functions.
This patch implements the entirety of 'atomic', creating a new Sema file
for the sema for it, as it is fairly sizable.
This completes the implementation of 'update' by implementing its last
restriction. This restriction requires at least 1 of the 'self', 'host',
or 'device' clauses.
These two clauses just take a 'var-list' and specify where the variables
should be copied from/to. This patch implements the AST nodes for them
and ensures they properly take a var-list.
The 'self' clause is an unfortunately difficult one, as it has a
significantly different meaning between 'update' and the other
constructs. This patch introduces a way for the 'self' clause to work
as both. I considered making this two separate AST nodes (one for
'self' on 'update' and one for the others), however this makes the
automated macros/etc for supporting a clause break.
Instead, 'self' has the ability to act as either a condition or as a
var-list clause. As this is the only one of its kind, it is implemented
all within it. If in the future we have more that work like this, we
should consider rewriting a lot of the macros that we use to make
clauses work, and make them separate ast nodes.
This executable construct has a larger list of clauses than some of the
others, plus has some additional restrictions. This patch implements
the AST node, plus the 'cannot be the body of a if, while, do, switch,
or label' statement restriction. Future patches will handle the
rest of the restrictions, which are based on clauses.
A fairly simple one, only valid on the 'set' construct, this clause
takes an int expression. Most of the work was already done as a part of
parsing, so this patch ends up being a lot of infrastructure.
The 'set' construct is another fairly simple one, it doesn't have an
associated statement and only a handful of allowed clauses. This patch
implements it and all the rules for it, allowing 3 of its for clauses.
The only exception is default_async, which will be implemented in a
future patch, because it isn't just being enabled, it needs a complete
new implementation.
This is a very simple sema implementation, and just required AST node
plus the existing diagnostics. This patch adds tests and adds the AST
node required, plus enables it for 'init' and 'shutdown' (only!)
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented. This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.
The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.
All 4 of the 'data' constructs have a requirement that at least 1 of a
small list of clauses must appear on the construct. This patch
implements that restriction, and updates all of the tests it takes to
do so.
This is a clause that is only valid on 'host_data' constructs, and
identifies variables which it should use the current device address.
From a Sema perspective, the only thing novel here is mild changes to
how ActOnVar works for this clause, else this is very much like the rest
of the 'var-list' clauses.
'delete' is another clause that has very little compile-time
implication, but needs a full AST that takes a var list. This patch
ipmlements it fully, plus adds sufficient test coverage.
This is another new clause specific to 'exit data' that takes a pointer
argument. This patch implements this the same way we do a few other
clauses (like attach) that have the same restrictions.
The 'if_present' clause controls the replacement of addresses in the
var-list in current device memory. This clause can only go on
'host_device'. From a Sema perspective, there isn't anything to do
beyond add this to AST and pass it on.
This is a very simple clause as far as sema is concerned. It is only
valid on 'exit data', and doesn't have any rules involving it, so it is
simply applied and passed onto the MLIR.
'copy' is another that is identical in behavior on 'data' as far as
semantic analysis is concerned as the compute constructs, so this patch
adds tests and enables 'copy'.
This is once again simply enabling this for 'data', 'enter data', and
'exit data' (and ensuring we error for 'host_data'). Implementation is
very simply to enable it rather than emit the not-implemented
diagnostic.
Semantically this is identical to all other constructs with this tag,
except in this case the 'wait' and 'async' are the only ones allowed
after it. This patch implements that rule using the existing
infrastructure.
These constructs are all very similar and closely related, so this patch
creates the AST nodes for them, serialization, printing/etc.
Additionally the restrictions are all added as tests/todos in the tests,
as those will have to be implemented once we get those clauses implemented.
Once again, this is a clause on a combined construct that does almost
exactly what the loop/compute construct version does, only with some sl
ightly different evaluation rules/sema rules as it doesn't have to
consider the parent, just the 'combined' construct. The two sets of
rules for reduction on loop and compute are fine together, so this
ensures they are all enforced for this too.
The 'gangs' 'num_gangs' 'reduction' diagnostic (Dim>1) had to be applied
to num_gangs as well, as it previously wasn't permissible to get in this
situation, but we now can.
Similar to 'worker', the 'vector' clause has some rules that needed to
be applied on its argument legality that for combined constructs need to
look at the current construct, not the 'effective' parent construct.
Additionally, it has some interaction with `vector_length` that needed
to be encoded as well. This patch implements it.
This is very similar to 'gang', except with fewer restrictions, and only an
interaction with 'num_workers', plus disallowing 'gang' and 'worker' in
its associated statement. This patch implements this, the same as how
'gang' implemented it.
This one is a bit complicated, as it has some interesting interactions,
as 'gang' Sema is required to look at its containing compute construct.
Except in the case of a combined construct, they are the same. This
resulted in a large refactor of the checking code for CheckGangExpr,
plus some additional work on the diagnostics for its interaction with
'num_gangs' and 'vector'/'worker'.
The original implementation rejected some valid constructs. The rule is
supposed to be:
Gang-on-Kernel cannot have a gang in its region
Worker cannot have a worker or gang in its region
Vector cannot have worker, gang, or vector in its region.
The previous implementation improperly implemented that vector wasnt'
allowed in the other two. This patch fixes it and adds testing for it.
'num_gangs', 'num_workers', and 'vector_length' are similar to
eachother, and are all the same implementation as for compute
constructs, so this patch just enables them and adds the necessary
testing to ensure they work correctly. These will get more complicated
when they get combined with 'gang', 'worker', 'vector' and 'reduction',
but those restrictions will be implemented when those clauses are
enabled.
Like collapse, this clause has only mild changes that need to be made.
Most of the associated work for the RAII container was already done
previously, so this is mostly just updating the diagnostics to properly
emit the right construct.
Most of the restrictions on 'collapse' involve the contents of the loop,
so this extends enforcement of all of that to all combined construct
loops, and alters the diagnostics to better reflect the construct it is
associated with.
This is the last set of 'no op' changes, and are all incredibly similar,
so they are being done together. They work the exact same for combined
constructs as they do for compute constructs, so this adds tests and
enables them.
Once again a situation where the combined and compute do the exact same
thing as far as Sema/AST/etc is concerned, so this patch adds tests and
enables it.
This is another clause where the parsing does all the required
enforcement besides the construct it appertains to, so this patch
removes the restriction and adds sufficient test coverage for combined
constructs.
This is another pair of clauses where the work is already done from
previous constructs, so this just has to allow them and include tests
for them. This patch adds testing, does a few little cleanup bits on the
clause checking, and enables these.