The 'device_type' clause modifies how the clauses that are legal after
it (seq, worker, vector, gang, bind) work. Previous patches were aware
of how that was going to happen, thanks to experience with doing the
same work on other constructs/clauses, so this is mostly just a repeat
of those.
Tests for the first 4 and interactions with them are included, but
'bind' is not yet implemented, so its device_type tests will be added
when it is lowered.
…hese
We've finished all of the clauses/etc that we're going to use this
visitor for, so we can remove the SourceLocation we used just for that,
and replace all NYI with unreachables.
This patch does the lowering for a 'declare' construct that is not a
function-local-scope. It also does the lowering for 'create', which has
an entry-op of create and exit-op of delete.
Global/NS/Struct scope 'declare's emit a single 'acc_ctor' and
'acc_dtor' (except in the case of 'link') per variable referenced. The
ctor is the entry op followed by a declare_enter. The dtor is a
get_device_ptr, followed by a declare_exit, followed by a delete(exit
op). This DOES include any necessary bounds.
This patch implements all of the above. We use a separate 'visitor' for
the clauses here since it is particularly different from the other uses,
AND there are only 4 valid clauses. Additionally, we had to split the
modifier conversion into its own 'helpers' file, which will hopefully
get some additional use in the future.
Just like the last handful of clauses, this is a pretty simple one,
doing device_resident (Entry op: declare_device_resident, and exit:
delete). This should be the last of the 'local' declare patches.
Just like the last handful of patches that did copy, copyin, copyout,
create, etc, this patch has the exact same behavior, except the
entry op is a present, and the exit is delete.
This is identical to 'copy' and 'copyin', except it uses 'create' and
'copyout' as its entry/exit op. This patch adds the same tests, and
similar code for all of it.
This is exactly like the 'copy', except the exit operation is a 'delete'
instead of a 'copyout'. Also, creating the 'delete' op has one less
argument to it, so we have to do some special handling when creating
that.
This patch implements the lowering for the 'copy' clause for a
function-local declare directive.
This is the first of the clauses that requires a 'cleanup' step, so it
also includes some basic infrastructure for that. Fortunately there are
only 8 clauses (only 6 of which require cleanup), so the if/else chain
won't get too long.
Also fortunately, we don't have to include any of the AST components, as
it is possible to tell all the required details from the entry operation
itself.
This is very similar to the 'link' that was done in the last patch,
except this works on all storage, but only on pointers. This also shows
a bit more of how the enter/exit pairs work in the test.
Implementation itself is very simple, as it is just properly handling it
in the clause handler.
'declare' is a declaration directive, so it can appear at 3 places:
Global/NS scope, class scope, or local scope. This patch implements ONLY
the 'local' scope lowering for 'declare'.
A 'declare' is lowered as a 'declare_enter' and 'declare_exit'
operation, plus data operands like all others. Sema restricts the form
of some of these, but they are otherwise identical.
'declare' DOES require at least 1 clause for the examples to make sense,
so this ALSO implements 'link', which is the 'simpliest' one. It is ONLY
attached to the 'declare_enter', and doesn't require any additional work
besides a very small addition to how we handle clauses.
The OpenACC spec allows only `v = x` form for atomic-read, and only when
both are L-values. The result is this ends up being a pretty trivial
patch, however it adds a decent amount of infrastructure for the other
forms of atomic.
Additionally, the 3.4 spec starts allowing the 'if' clause on atomic,
which has recently been added to the ACC dialect. This patch also
ensures that can be lowered as well. Extensive testing of this feature
was done on other clauses, so there isn't much further work/testing to
be done for it.
Following on the Sema changes, this does the lowering for all of the
operators that can be done as a compound operator. Lowering is very
simply looping through the objects based on array/compound/etc, and
doing a call to the operation.
I originally expected that we were going to need the initExpr stored
separately from the allocaDecl when doing arrays/pointers, however after
implementing it, we found that the idea of having the allocaDecl just
store its init directly still works perfectly. This patch removes the
extra field from the AST.
A previous review comment pointed out that operations with only a single
result implicitly convert to `mlir::Value`. This patch removes the
explicit use of `getResult` where it is unnecessary in OpenACC lowering.
However, there ARE a few cases where it is necessary where the
`mlir::ValueRange` implicit constructor from a single value is being
used, so those are untouched.
Additionally, while the previous patch was being committed (#161382), a
second patch (#161431) changed the format of cir.casts, so this patch
fixes the additional test lines for that as well.
It was noticed on a previous patch that I could have emitted recipes in
lexical order instead of reverse order, which would improve the
readability of a lot of tests. This patch implements that, and changes
all of the required test.
After previous implementation, I discovered that we were both doing
arrays incorrectly for recipes, plus didn't get the pointer allocations
done correctly. This patch is the first of a few in a series that
attempts to make sure we get all pointers/arrays correct.
This patch is limited to just 'private' and destructors, which
simplifies the review significantly. Destructors are simply looped
through and called at each level.
The 'recipe-decl' is the 'least bounded' (that is, the type of the
expression, in the type of `int[5] i; #pragma acc parallel
private(i[1])`, the type of the `recipe-decl` is `int`. This allows
us to do init/destruction at the element level.
This patch also adds infrastructure for the rest of the series of
private (for the init section), as well as extensive testing for
'private', with a lot of 'TODO' locations.
Future patches will fill these in, but at the moment, there is an NYI
warning for bounds, so a number of tests are updated to handle that.
The recipe generation was dependent on the clause kind, which meant we
had all of the recipe generation duplicated in each of clauses. This
patch copy/pastes all of them into their own type to do recipe
generation, which should reduce clang's size.
Additionally, we've moved it off into its own file, which should make
readability/organization improvements.
Expressions/references with 'bounds' are going to need to do
initialization significantly differently, so we need to have the
initializer and the declaration 'separate' in the future. This patch
splits the AST node into two, and normalizes them a bit.
Additionally, since this required significant work on the recipe
generation, this patch also does a bit of a refactor to improve
readability and future expansion, now that we have a good understanding
of how these are going to look.
Patch #156545 is introducing a different syntax for the 'destroy'
section of a recipe, which takes the 'original' value as the first
argument, and the one-to-be-destroyed as the 2nd. This patch corrects
the lowering to match that signature.
As mentioned in a previous review, we aren't properly generating
init/destroy/copy (combiner will need to be done correctly too!) regions
for recipe generation. In the case where these have 'bounds', we can do
a much better job of figuring out the type and how much needs to be
done, but that is going to be its own engineering effort.
For now, add an NYI as a note to come back to this.
These four operators have an initial value of 0, so they are able to use
C/C++ 'zero init'. This patch adds the infrastructure to the Sema init
calculations to differentiate based on the reduction operator, then
enables emission of the inits in CodeGen (which should work for all
inits, once generated).
The rest of this test is just updating validation to make sure that the
inits happen correctly for all 4 operators.
This patch implements basic reduction recipe lowering, plus adds a bunch
of tests for it that should be meaningful later. At the moment, all this
does is ensure that we get the init 'alloca' set right (the actual
initializer isn't done correctly yet, and will be in a followup), an
empty combiner (though the type of certain operations probably has to be
different as well, when we get to those), and a full-destruction, as we
already have the infrastructure for it.
I realized while messing with other things that I'd written all of the
recipe tests for C++, so this patch adds a bunch of tests for C mode.
The assert wasn't quite accurate (as C default init doesn't really do
anything/have an AST node), so that is corrected. Also, the lack of
cir.copy causes some of the firstprivate tests to be incomplete, so
added TODOs for that as well.
This changes a bunch of places which use getAs<TagType>, including
derived types, just to obtain the tag definition.
This is preparation for #155028, offloading all the changes that PR used
to introduce which don't depend on any new helpers.
This patch is the last of the 'firstprivate' clause lowering patches. It
takes the already generated 'copy' init from Sema and uses it to
generate the IR for the copy section of the recipe.
However, one thing that this patch had to do, was come up with a way to
hijack the decl registration in CIRGenFunction. Because these decls are
being created in a 'different' place, we need to remove the things we've
added. We could alternatively generate these 'differently', but it seems
worth a little extra effort here to avoid having to re-implement
variable initialization.
This patch implements the basic lowering infrastructure, but does not
quite implement the copy initialization, which requires #153622.
It does however pass verification for the 'copy' section, which just
contains a yield.
In preperation of the firstprivate implementation, this separates out
some functions to make it easier to read.
Additionally, it cleans up the VarDecl->alloca relationship, which will
prevent issues if we have to re-use the same vardecl for a future
generated recipe (and causes concerns in firstprivate later).
Previously, #151360 implemented 'private' clause lowering, but didn't
properly initialize the variables. This patch adds that behavior to make
sure we correctly get the constructor or other init called.
The private clause is the first with 'recipes', so a lot of
infrastructure is included here, including some MLIR dialect changes
that allow simple adding of a privatization. We'll likely get similar
for firstprivate and reduction.
Also, we have quite a bit of infrastructure in clause lowering to make
sure we have most cases we could think of covered.
At the moment, ONLY private is implemented, so all it requires is an
'init' segment (that doesn't call any copy operations), and potentially
a 'destroy' segment. However, actually calling 'init' functions on each
of the elements in them are not properly implemented, and will be in a
followup patch.
This patch implements all of that, and adds tests in a way that will be
useful for firstprivate as well.
The 'cache' construct is an interesting one, in that it doesn't take any
clauses, and is exclusively a collection of variables. Lowering wise,
these just get added to the associated acc.loop. This did require
some work to ensure that the cache doesn't have 'vars' that aren't
inside of the loop, but Sema is taking care of that with a warning.
Otherwise this is just a fairly simple amount of lowering, where each
'var' in the list creates an acc.cache, which is added to the acc.loop.
This implements the async, wait, if, and if_present (as well as
device_type, but that is a detail of async/wait) lowering. All of
these are implemented the same way they are for the compute constructs,
so this is a pretty mild amount of changes.
The 'update' construct has 3 'var-list' clauses, device, self, and host.
Each has a pretty simple data-operand type syntax in the IR, so this
patch implements them as well. At least one of those is required to be
present on an 'update', so we cannot do any lowering without them.
Note that 'self' and 'host' are aliases.
Similar to 'enter data', except the data clauses have a 'getdeviceptr'
operation before, so that they can properly use the 'exit' operation
correctly. While this is a touch awkward, it fits perfectly into the
existing infrastructure.
Same as with 'enter data', we had to add some add-functions for async
and wait.
'enter data' is a new construct type that requires one of the data
clauses, so we had to wait for all clauses to be ready before we could
commit this. Most of the clauses are simple, but there is a little bit
of work to get 'async' and 'wait' to have similar interfaces in the ACC
dialect, where helpers were added.
These three are once again are IR clones of what the compute
IR looks like, so this patch is just adding the implementation and
writing sufficient tests.
This lowering ends up being identical to 'create', except it is a
acc.nocreate for the start operation, and it doesn't permit modifier
list. This patch implements this by adding it to the list of permitted
handlers (along with compute), plus adds tests.
These work exactly the same way they do for compute constructs, so this
implements them identically and adds tests. The list of legal modifiers
is different, but that is all handled in Sema.
…ombined
This patch does the lowering of copyin (represented as a
acc.copyin/acc.delete), copyout (acc.create/acc.copyin), and create
(acc.create/acc.delete).
Additionally, it found a few problems with #144806, so it fixes those as
well.
Some of the 'data' clauses can have a 'modifier-list' which specifies
one of a few keywords from a list. This patch adds support for lowering
them following #144806.
We have to keep a separate enum from MLIR, since we have to keep
'always' around for semantic reasons, whereas the dialect doesn't
differentiate these.
This patch ensures we get these right for the only applicable clause so
far, which is 'copy'.
Attach is identical to 'present', except it generates an acc.attach and
acc.detach. This patch implements these, just like the preivous handful
of clauses.
no_create has its own 'data-in', plus uses the 'delete' for the data-out
operation. Additionally, like all data clauses it uses the 'async'
functionality previous implemented. This patch implements no_create for
combined/compute constructs completely, and ensures that the feature is
tested.
'host_data' has its own Op kind, so this handles the lowering there, it
looks exactly like the other ones we've done so far, so nothing novel
here.
host_data takes 3 clauses, 1 of which is required.
'use_device' is required, and results in an acc.use_device operation,
which then feeds into the dataOperands list on acc.host_data.
'if_present' is a simple attribute on the operand.
'if' is a condition on the operand, identical to our other handling of
'if'.
This patch handles all of these.
These ended up not being too much of a change, it just requires that we
properly emit a member expression,then use it in the varPtr. I also
fixed up the 'name' field to be the expression print, as that was
necessary to get this correct.
Finally, I added a TON of tests to convince myself that I've got this
correct, and hopefully the IR shows that.
The array indexes(and sections) are represented by the acc.bounds
operation, which this ensures we fill in properly. The lowerbound is
required, so we always get that.
The upperbound or extent is required. We typically do extent, since that
is the 'length' as specified by ACC, but in cases where we have implicit
length, we use the extent instead.
It isn't clear when 'stride' should be anything besides 1, though by my
reading, since we have full-types in the emitted code, we should never
have it be anything but 1.
This patch enables these for copy on compute and combined constructs,
and makes sure to test everything I could think of for
combinations/permutations.