This patch is the last of the 'firstprivate' clause lowering patches. It
takes the already generated 'copy' init from Sema and uses it to
generate the IR for the copy section of the recipe.
However, one thing that this patch had to do, was come up with a way to
hijack the decl registration in CIRGenFunction. Because these decls are
being created in a 'different' place, we need to remove the things we've
added. We could alternatively generate these 'differently', but it seems
worth a little extra effort here to avoid having to re-implement
variable initialization.
Depends on #153625
This patch adds support for statement expressions. It also changes
emitCompoundStmt and emitCompoundStmtWithoutScope to accept an Address
that the optional result is written to. This allows the creation of the
alloca ahead of the creation of the scope which saves us from hoisting
the alloca to its parent scope.
The original patch to implement basic lowering for firstprivate didn't
have the Sema work to change the name of the variable being generated
from openacc.private.init to openacc.firstprivate.init. I forgot about
that when I merged the Sema changes this morning, so the tests now
failed. This patch fixes those up.
Additionally, Suggested on #153622 post-commit, it seems like a good idea to
use a size of APInt that matches the size-type, so this changes us to use that
instead.
This patch implements the basic lowering infrastructure, but does not
quite implement the copy initialization, which requires #153622.
It does however pass verification for the 'copy' section, which just
contains a yield.
The OpenACC standard is going to change to clarify that init, shutdown,
and set should only have a single architecture in each 'device_type'
clause. This patch implements that restriction.
See: https://github.com/OpenACC/openacc-spec/pull/550
Previously, #151360 implemented 'private' clause lowering, but didn't
properly initialize the variables. This patch adds that behavior to make
sure we correctly get the constructor or other init called.
The private clause is the first with 'recipes', so a lot of
infrastructure is included here, including some MLIR dialect changes
that allow simple adding of a privatization. We'll likely get similar
for firstprivate and reduction.
Also, we have quite a bit of infrastructure in clause lowering to make
sure we have most cases we could think of covered.
At the moment, ONLY private is implemented, so all it requires is an
'init' segment (that doesn't call any copy operations), and potentially
a 'destroy' segment. However, actually calling 'init' functions on each
of the elements in them are not properly implemented, and will be in a
followup patch.
This patch implements all of that, and adds tests in a way that will be
useful for firstprivate as well.
These two both allow arrays as their variable references, but it is a
common thing to use sub-arrays as a way to get a pointer to act as an
array with other compilers. This patch adds these, with an
extension-warning.
The 'cache' construct is an interesting one, in that it doesn't take any
clauses, and is exclusively a collection of variables. Lowering wise,
these just get added to the associated acc.loop. This did require
some work to ensure that the cache doesn't have 'vars' that aren't
inside of the loop, but Sema is taking care of that with a warning.
Otherwise this is just a fairly simple amount of lowering, where each
'var' in the list creates an acc.cache, which is added to the acc.loop.
This implements the async, wait, if, and if_present (as well as
device_type, but that is a detail of async/wait) lowering. All of
these are implemented the same way they are for the compute constructs,
so this is a pretty mild amount of changes.
The 'update' construct has 3 'var-list' clauses, device, self, and host.
Each has a pretty simple data-operand type syntax in the IR, so this
patch implements them as well. At least one of those is required to be
present on an 'update', so we cannot do any lowering without them.
Note that 'self' and 'host' are aliases.
Similar to 'enter data', except the data clauses have a 'getdeviceptr'
operation before, so that they can properly use the 'exit' operation
correctly. While this is a touch awkward, it fits perfectly into the
existing infrastructure.
Same as with 'enter data', we had to add some add-functions for async
and wait.
'enter data' is a new construct type that requires one of the data
clauses, so we had to wait for all clauses to be ready before we could
commit this. Most of the clauses are simple, but there is a little bit
of work to get 'async' and 'wait' to have similar interfaces in the ACC
dialect, where helpers were added.
These three are once again are IR clones of what the compute
IR looks like, so this patch is just adding the implementation and
writing sufficient tests.
This lowering ends up being identical to 'create', except it is a
acc.nocreate for the start operation, and it doesn't permit modifier
list. This patch implements this by adding it to the list of permitted
handlers (along with compute), plus adds tests.
These work exactly the same way they do for compute constructs, so this
implements them identically and adds tests. The list of legal modifiers
is different, but that is all handled in Sema.
…ombined
This patch does the lowering of copyin (represented as a
acc.copyin/acc.delete), copyout (acc.create/acc.copyin), and create
(acc.create/acc.delete).
Additionally, it found a few problems with #144806, so it fixes those as
well.
Review #145600 and #145770 crossed, which caused compute-copy and
combined-copy tests to fail because of an insufficiently written 'check'
line for a cir.func, which didn't account for the linkage spec being
added. This patch adds that to fix the build.
Some of the 'data' clauses can have a 'modifier-list' which specifies
one of a few keywords from a list. This patch adds support for lowering
them following #144806.
We have to keep a separate enum from MLIR, since we have to keep
'always' around for semantic reasons, whereas the dialect doesn't
differentiate these.
This patch ensures we get these right for the only applicable clause so
far, which is 'copy'.
This change adds support for function linkage and visibility and related
attributes. Most of the test changes are generalizations to allow
'dso_local' to be accepted where we aren't specifically testing for it.
Some tests based on CIR inputs have been updated to add 'private' to
function declarations where required by newly supported interfaces.
The dso-local.c test has been updated to add specific tests for
dso_local being set correctly, and a new test, func-linkage.cpp tests
other linkage settings.
This change sets `comdat` correctly in CIR, but it is not yet applied to
functions when lowering to LLVM IR. That will be handled in a later
change.
PR #143720 adds a requirement to the ACC dialect that every acc.loop
must have a seq, independent, or auto attribute for the 'default'
device_type. The standard has rules for how this can be intuited:
orphan/parallel/parallel loop: independent
kernels/kernels loop: auto
serial/serial loop: seq, unless there is a gang/worker/vector, at which
point it should be 'auto'.
This patch implements all of this rule as a 'cleanup' step on the IR
generation for combined/loop operations. Note that the test impact is
much less since I inadvertently have my 'operation' terminating curley
matching the end curley from 'attribute' instead of the front of the
line, so I've added sufficient tests to ensure I captured the above.
Attach is identical to 'present', except it generates an acc.attach and
acc.detach. This patch implements these, just like the preivous handful
of clauses.
no_create has its own 'data-in', plus uses the 'delete' for the data-out
operation. Additionally, like all data clauses it uses the 'async'
functionality previous implemented. This patch implements no_create for
combined/compute constructs completely, and ensures that the feature is
tested.
'host_data' has its own Op kind, so this handles the lowering there, it
looks exactly like the other ones we've done so far, so nothing novel
here.
host_data takes 3 clauses, 1 of which is required.
'use_device' is required, and results in an acc.use_device operation,
which then feeds into the dataOperands list on acc.host_data.
'if_present' is a simple attribute on the operand.
'if' is a condition on the operand, identical to our other handling of
'if'.
This patch handles all of these.
The patch #142998 crossed in the air with #142862. This resulted in 2
of the tests from the former to not have the inlined function emitted.
This patch adds an additional function to force these to be emitted.
These ended up not being too much of a change, it just requires that we
properly emit a member expression,then use it in the varPtr. I also
fixed up the 'name' field to be the expression print, as that was
necessary to get this correct.
Finally, I added a TON of tests to convince myself that I've got this
correct, and hopefully the IR shows that.
This adds alignment support for GlobalOp, LoadOp, and StoreOp.
Tests which failed because cir.store/cir.load now print alignment were
updated with wildcard matches, except where the alignment was relevant
to the test. Tests which check for cir.store/cir.load in cases that
don't have explicit alignment were not updated.
New tests for alignment are alignment.c, align-load.c, and
align-store.c.
The array indexes(and sections) are represented by the acc.bounds
operation, which this ensures we fill in properly. The lowerbound is
required, so we always get that.
The upperbound or extent is required. We typically do extent, since that
is the 'length' as specified by ACC, but in cases where we have implicit
length, we use the extent instead.
It isn't clear when 'stride' should be anything besides 1, though by my
reading, since we have full-types in the emitted code, we should never
have it be anything but 1.
This patch enables these for copy on compute and combined constructs,
and makes sure to test everything I could think of for
combinations/permutations.
A recent change to the error message produced for unhandled compound
statements without scope introduced a failure in an OpenACC test that
was checking for the old error message. This change updates the test to
check for the new message.
These are identical in IR as the 'compute' constructs, but require a
little additional work since we have 2 operations to work around, not
just 1. Note that the test is nearly identical to the compute version,
except that the combined 'tag's are present, plus the 'loop' construct.
…structs
This is a partial implementation of the 'copy' lowering. It is missing 3
things, which are coming in future patches:
1- does not handle subscript/subarrays for emission as variables 2- does
not handle member expressions for emissions as variables 3- does not
handle modifier-list
1 and 2 are because of the complexity and should be split off into a
separate patch. 3 is because it isn't clear how the IR is going to
handle this, and I'd like to make sure it gets done 'all at once' when
the IR is updated to handle these, so I'm pushing that off to the
future.
This DOES however handle the complexity of having a acc.copyin and
acc.copyout, plus the additional complexity of the 'async' clause.
Implementation is 'trivial' as were the rest of the non data clauses, so
this implements them, finishing the last non-data/var-list clause for
combined constructs. Also ensures this is properly tested.
This clause requires that we attach it to the 'loop', and can generate
variables, so this is the first loop clause to require that we properly
set up the insertion location. This patch does so, as a part of
lowering 'tile' correctly.
These two require that we correctly set up the 'insertion points' for
the compute construct when doing a combined construct. This patch adds
that and verifies that we're doing it correctly.
This adds two clauses plus the infrastructure for emitting the clauses
on combined constructs. Combined constructs require two operations, so
this makes sure we emit on the 'correct' one. It DOES require that the
combined construct handling picks the correct one to put it on, AND sets
up the 'inserter' correctly, but these two clauses don't require an
inserter, so a future patch will get those.
Combined constructs are emitted a little oddly, in that they are the
first ones where there are two operations for a single construct. First,
the compute variant is emitted with 'combined(loop)', then the loop
operation is emitted with 'combined(<variant>)'. Each gets its own
normal terminator.
This patch does not yet implement clauses at all, since that is going to
require special attention to make sure we get the emitting of them
correct, since certain clauses go to different locations, and need their
insertion-points set correctly. So this patch sets it up so that we will
emit the 'not implemented' diagnostic for all clauses.
This clause requires an entire additional collection to keep track of
the gang 'kind' or 'type'. That work is maintained in the OpenACC
dialect functions. Otherwise, this is effectively the same as the
worker/vectors.