The 'vector' clause specifies the iterations to be executed in vector or
SIMD mode. There are some limitations on which associated compute
contexts may be associated with this and have arguments, but otherwise
this is a fairly unrestricted clause.
It DOES have region limits like 'gang' and 'worker'.
The worker clause specifies iterations of the loop/ that are executed in
parallel by distributing the iterations among the multiple works within
a single gang.
The sema rules for this type are simply that it cannot be combined with
a `kernel` construct with a `num_workers` clause, child `loop` clauses
cannot contain a `gang` or `worker` clause, and that the argument is oly
allowed when associated with a `kernel`.
The 'gang' clause is used to specify parallel execution of loops, thus
has some complicated rules depending on the 'loop's associated compute
construct. This patch implements all of those.
the 'tile' clause requires that it be followed by N (where N is the
number of size expressions) 'tightly nested loops'. This means the
same as it does in 'collapse', so much of the implementation is
simliar/shared with that.
The 'tile' clause shares quite a bit of the rules with 'collapse', so a
followup patch will add those tests/behaviors. This patch deals with
adding the AST node.
The 'tile' clause takes a series of integer constant expressions, or *.
The asterisk is now represented by a new OpenACCAsteriskSizeExpr node,
else this clause is very similar to others.
OpenACC Spec requires that each loop associated with a 'collapse' have
exactly 1 loop/nest. This is implemented in 2 parts: 1- diagnosing when
we see a 2nd loop at any level with an applicable 'collapse'
2- Diagnosing if we didn't see enough 'depth' of loops.
Other loops (non-for) are diagnosed by a previous patch, and other
intervening code will be diagnosed in a followup.
'collapse' makes the 'loop' construct apply to the next N nested loops,
and 'loop' requires all associated loops be 'for' loops (or range-for).
This patch adds this restriction, plus adds a number of infrastructure
changes to permit some of the other restrictions in the future to be
implemented.
'collapse' also requires that there is not any calls to other directives
inside of the collapse region, so this also implements that limitation.
The 'collapse' clause on a 'loop' construct is used to specify how many
nested loops are associated with the 'loop' construct. It takes an
optional 'force' tag, and an integer constant expression as arguments.
There are many other restrictions based on the contents of the loop/etc,
but those are implemented in followup patches, for now, this patch just
adds the AST node and does basic argument checking on the loop-count.
The OpenACC standard makes depending on side effects to be effectively
UB, so this patch ensures we handle them reaonably by making it a potentially
evaluated context, and ignoring cleanups.
This clause works identically as far as Sema is concerned, to the
'private' clause on compute constructs, so this simply adds tests and
unblocks the ASTNode generation and Sema checking when used on loop
clauses.
These three clauses are all quite trivial, as they take no parameters.
They are mutually exclusive, and 'seq' has some other exclusives that
are implemented here.
The ONE thing that isn't implemented is 2.9's restriction (line 2010):
'A loop associated with a 'loop' construct that does not have a 'seq'
clause must be written to meet all the following conditions'.
Future clauses will require similar work, so it'll be done as a
followup.
This clause is effectively identical to how this works on compute
clauses, however the list of clauses allowed after it are slightly
different. This enables the clause for the 'loop', and ensures we're
permitting the correct list.
This patch implements the 'loop' construct AST, as well as the basic
appertainment rule. Additionally, it sets up the 'parent' compute
construct, which is necessary for codegen/other diagnostics.
A 'loop' can apply to a for or range-for loop, otherwise it has no other
restrictions (though some of its clauses do).
I apparently missed a few other clauses as well when doing my initial
implementation, so this adds tests for all, and fixes up the few that
had problems.
This is something that I'll do better to keep an eye on, though
shouldn't be necessary once the rest of the clauses are implemented and
we can remove the 'default' case.
Seemingly I forgot to implement the appertainment checks when doing the
original device_type implementation, so we fell through to the 'not
implemented' section of the diagnostics.
This patch corrects the appertainment, so that we disallow it correctly.
I discovered while working on something else that we were using the
location of the directive name as the 'beginloc' which caused some
problems in a few places. This patch makes it so our beginloc is the
'#' as we originally designed, and then adds a DirectiveLoc concept to a
construct for use diagnosing the name.
'reduction' has a few restrictions over normal 'var-list' clauses:
1- On parallel, a num_gangs can only have 1 argument when combined with
reduction. These two aren't able to be combined on any other of the
compute constructs however.
2- The vars all must be 'numerical data types' types of some sort, or a
'composite of numerical data types'. A list of types is given in the
standard as a minimum, so we choose 'isScalar', which covers all of
these types and keeps types that are actually numeric. Other compilers
don't seem to implement the 'composite of numerical data types', though
we do.
3- Because of the above restrictions, member-of-composite is not
allowed, so any access via a memberexpr is disallowed. Array-element and
sub-arrays (aka array sections) are both permitted, so long as they meet
the requirements of #2.
This patch implements all of these for compute constructs.
device_type, also spelled as dtype, specifies the applicability of the
clauses following it, and takes a series of identifiers representing the
architectures it applies to. As we don't have a source for the valid
architectures yet, this patch just accepts all.
Semantically, this also limits the list of clauses that can be applied
after the device_type, so this implements that as well.
This reverts commit 06f04b2e27f2586d3db2204ed4e54f8b78fea74e.
This reapplies commit c4a9a374749deb5f2a932a7d4ef9321be1b2ae5d.
The build failures were caused by the patch depending on the order of
evaluation of arguments to a function. This reapplication separates out
the capture of one of the values.
This reverts commit c4a9a374749deb5f2a932a7d4ef9321be1b2ae5d.
This and the followup patch keep hitting an assert I wrote on the build
bots in a way that isn't clear. Reverting so I can fix it without a
rush.
device_type, also spelled as dtype, specifies the applicability of the
clauses following it, and takes a series of identifiers representing the
architectures it applies to. As we don't have a source for the valid
architectures yet, this patch just accepts all.
Semantically, this also limits the list of clauses that can be applied
after the device_type, so this implements that as well.
'wait' takes a few int-exprs (well, a series of async-arguments, but
those are effectively just an int-expr), plus a pair of tags. This
patch adds the support for this to the AST, and does the appropriate
semantic analysis for them.
This is a pretty simple clause, it takes an 'async-argument', which
effectively needs to be just parsed as an 'int' argument, since it can
be an arbitrarly integer at runtime (and negative values are legal for
implementation defined values).
This patch also cleans up the async-argument parsing, so 'wait' got some
minor quality-of-life improvements for parsing (both clause and
construct).
These two are very similar to the other 'var-list' variants, except they
require that the type of the variable be a pointer. This patch
implements that restriction.
Like 'copy', these also have alternate names, so this implements that as
well. Additionally, these have an optional tag of either 'readonly' or
'zero' depending on the clause.
Otherwise, this is a pretty rote implementation of the clause, as there
aren't any special rules for it.
Like present, no_create, and first_private, copy is a clause that takes
just a var-list, and follows the same rules as the others.
The one unique part of this clause is that it ALSO supports two
deprecated/backwards-compatibility spellings, so this patch adds them
and implements them.
This implementation takes quite a bit from the OMP implementation of
array sections, but only has to enforce the rules as applicable to
OpenACC. Additionally, it does its best to create an AST node (with the
assistance of RecoveryExprs) with as much checking done as soon as
possible in the case of instantiations.
The private clause is the first that takes a 'var-list', thus this has a
lot of additional work to enable the var-list type. A 'var' is a
traditional variable reference, subscript, member-expression, or
array-section, so checking of these is pretty minor.
Note: This ran into some issues with array-sections (aka sub-arrays)
that will be fixed in a follow-up patch.
OpenACC is going to need an array sections implementation that is a
simpler version/more restrictive version of the OpenMP version.
This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`,
then adds an enum to choose between the two.
This also fixes a couple of 'drive-by' issues that I discovered on the way,
but leaves the OpenACC Sema parts reasonably unimplemented (no semantic
analysis implementation), as that will be a followup patch.
num_gangs takes an 'int-expr-list', for 'parallel', and an 'int-expr'
for 'kernels'. This patch changes the parsing to always parse it as an
'int-expr-list', then correct the expression count during Sema. It also
implements the rest of the semantic analysis changes for this clause.
The 'vector_length' clause is semantically identical to the
'num_workers' clause, in that it takes a mandatory single int-expr. This
is implemented identically to it.
`self` clauses on compute constructs take an optional condition
expression. We again limit the implementation to ONLY compute constructs
to ensure we get all the rules correct for others. However, this one
will be particularly complicated, as it takes a `var-list` for `update`,
so when we get to that construct/clause combination, we need to do that
as well.
This patch also furthers uses of the `OpenACCClauses.def` as it became
useful while implementing this (as well as some other minor refactors as
I went through).
Finally, `self` and `if` clauses have an interaction with each other, if
an `if` clause evaluates to `true`, the `self` clause has no effect.
While this is intended and can be used 'meaningfully', we are warning on
this with a very granular warning, so that this edge case will be
noticed by newer users, but can be disabled trivially.
Like with the 'default' clause, this is being applied to only Compute
Constructs for now. The 'if' clause takes a condition expression which
is used as a runtime value.
This is not a particularly complex semantic implementation, as there
isn't much to this clause, other than its interactions with 'self',
which will be managed in the patch to implement that.
As a followup to my previous commits, this is an implementation of a
single clause, in this case the 'default' clause. This implements all
semantic analysis for it on compute clauses, and continues to leave it
rejected for all others (some as 'doesnt appertain', others as 'not
implemented' as appropriate).
This also implements and tests the TreeTransform as requested in the
previous patch.
Now that we have AST nodes for OpenACC Clauses, this patch adds their
creation to Sema and makes the Parser call all the required functions.
This also redoes TreeTransform to work with the clauses/make sure they
are transformed.
Much of this is NFC, since there is no clause we can test this behavior
with. However, there IS one noticable change; we are now no longer
diagnosing that a clause is 'not implemented' unless it there was no
errors parsing its parameters. This is because it cleans up how we
create and diagnose clauses.
This is a follow-up to #84184. Multiple reviewers there pointed out to
me that we should have a common base class for `Sema` and `SemaOpenACC`
to avoid code duplication for common helpers like `getLangOpts()`. On
top of that, `Diag()` function was requested for `SemaOpenACC`. This
patch delivers both.
The intent is to keep `SemaBase` as small as possible, as things there
are globally available across `Sema` and its parts without any
additional effort from usage side. Overused, this can undermine the
whole endeavor of splitting `Sema` apart.
Apart of shuffling code around, this patch introduces a helper private
function `SemaDiagnosticBuilder::getDeviceDeferredDiags()`, the sole
purpose of which is to encapsulate member access into (incomplete)
`Sema` for function templates defined in the header, where `Sema` can't
be complete.
As a first step in adding clause support for OpenACC to Semantic
Analysis, this patch adds the 'base' AST nodes required for clauses.
This patch has no functional effect at the moment, but followup patches
will add the semantic analysis of clauses (plus individual clauses).
This patch moves OpenACC parts of `Sema` into a separate class
`SemaOpenACC` that is placed in a separate header `Sema/SemaOpenACC.h`.
This patch is intended to be a model of factoring things out of `Sema`,
so I picked a small OpenACC part.
Goals are the following:
1) Split `Sema` into manageable parts.
2) Make dependencies between parts visible.
3) Improve Clang development cycle by avoiding recompiling unrelated
parts of the compiler.
4) Avoid compile-time regressions.
5) Avoid notational regressions in the code that uses Sema.
So far, all the work we've done for compute constructs has only used
'parallel'. This patch does the work to enable the same logic for
'serial' and 'kernels' constructs as well, since they are the same
semantic behavior.
This patch Implements AST node creation and appertainment enforcement
for 'parallel', as well as changes the 'not implemented' messages to be
more specific. It does not deal with clauses/clause legality, nor a few
of the other rules from the standard, but this gets us most of the way
for a framework for future construct implementation.