…hese
We've finished all of the clauses/etc that we're going to use this
visitor for, so we can remove the SourceLocation we used just for that,
and replace all NYI with unreachables.
The 'cache' construct is an interesting one, in that it doesn't take any
clauses, and is exclusively a collection of variables. Lowering wise,
these just get added to the associated acc.loop. This did require
some work to ensure that the cache doesn't have 'vars' that aren't
inside of the loop, but Sema is taking care of that with a warning.
Otherwise this is just a fairly simple amount of lowering, where each
'var' in the list creates an acc.cache, which is added to the acc.loop.
PR #143720 adds a requirement to the ACC dialect that every acc.loop
must have a seq, independent, or auto attribute for the 'default'
device_type. The standard has rules for how this can be intuited:
orphan/parallel/parallel loop: independent
kernels/kernels loop: auto
serial/serial loop: seq, unless there is a gang/worker/vector, at which
point it should be 'auto'.
This patch implements all of this rule as a 'cleanup' step on the IR
generation for combined/loop operations. Note that the test impact is
much less since I inadvertently have my 'operation' terminating curley
matching the end curley from 'attribute' instead of the front of the
line, so I've added sufficient tests to ensure I captured the above.
Having the whole clause emission be in a header file ended up being
pragmatic, but ended up being a sizable negative for a variety of
reasons. This patch moves it to its own .cpp file and makes
CIRGenFunction instead call into the visitor via a template instead.
This is possible because the valid list of construct kinds is quite
finite, and easy to enumerate.
…structs
This is a partial implementation of the 'copy' lowering. It is missing 3
things, which are coming in future patches:
1- does not handle subscript/subarrays for emission as variables 2- does
not handle member expressions for emissions as variables 3- does not
handle modifier-list
1 and 2 are because of the complexity and should be split off into a
separate patch. 3 is because it isn't clear how the IR is going to
handle this, and I'd like to make sure it gets done 'all at once' when
the IR is updated to handle these, so I'm pushing that off to the
future.
This DOES however handle the complexity of having a acc.copyin and
acc.copyout, plus the additional complexity of the 'async' clause.
As can be seen by the comment, this ends up being a construct that is
going to be quite a lot of work in the future to make sure we properly
identify the upperbound, lowerbound, and step. For now, we just treat
the 'loop' as container so that we can put the 'for' loop into it.
In the future, we'll have to teach the OpenACC dialect how to derive the
upperbound, lowerbound, and step from the cir.for loop. Additionally,
we'll probably have to add a few more options to it so that we can give
it the recipes it needs to determine these for random access iterators.
For Integer and Pointer values, these should already be known.
The 'loop' emit for OpenACC is particularly complicated/involved, so it
makes sense to be in its own file. This patch splits it out into its own
file, as well as the clause emitter code (as loop is going to require
that).