These two clauses just take a 'var-list' and specify where the variables
should be copied from/to. This patch implements the AST nodes for them
and ensures they properly take a var-list.
The 'self' clause is an unfortunately difficult one, as it has a
significantly different meaning between 'update' and the other
constructs. This patch introduces a way for the 'self' clause to work
as both. I considered making this two separate AST nodes (one for
'self' on 'update' and one for the others), however this makes the
automated macros/etc for supporting a clause break.
Instead, 'self' has the ability to act as either a condition or as a
var-list clause. As this is the only one of its kind, it is implemented
all within it. If in the future we have more that work like this, we
should consider rewriting a lot of the macros that we use to make
clauses work, and make them separate ast nodes.
This executable construct has a larger list of clauses than some of the
others, plus has some additional restrictions. This patch implements
the AST node, plus the 'cannot be the body of a if, while, do, switch,
or label' statement restriction. Future patches will handle the
rest of the restrictions, which are based on clauses.
A fairly simple one, only valid on the 'set' construct, this clause
takes an int expression. Most of the work was already done as a part of
parsing, so this patch ends up being a lot of infrastructure.
The 'set' construct is another fairly simple one, it doesn't have an
associated statement and only a handful of allowed clauses. This patch
implements it and all the rules for it, allowing 3 of its for clauses.
The only exception is default_async, which will be implemented in a
future patch, because it isn't just being enabled, it needs a complete
new implementation.
This is a very simple sema implementation, and just required AST node
plus the existing diagnostics. This patch adds tests and adds the AST
node required, plus enables it for 'init' and 'shutdown' (only!)
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented. This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.
The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.
This is a clause that is only valid on 'host_data' constructs, and
identifies variables which it should use the current device address.
From a Sema perspective, the only thing novel here is mild changes to
how ActOnVar works for this clause, else this is very much like the rest
of the 'var-list' clauses.
'delete' is another clause that has very little compile-time
implication, but needs a full AST that takes a var list. This patch
ipmlements it fully, plus adds sufficient test coverage.
This is another new clause specific to 'exit data' that takes a pointer
argument. This patch implements this the same way we do a few other
clauses (like attach) that have the same restrictions.
The 'if_present' clause controls the replacement of addresses in the
var-list in current device memory. This clause can only go on
'host_device'. From a Sema perspective, there isn't anything to do
beyond add this to AST and pass it on.
This is a very simple clause as far as sema is concerned. It is only
valid on 'exit data', and doesn't have any rules involving it, so it is
simply applied and passed onto the MLIR.
These constructs are all very similar and closely related, so this patch
creates the AST nodes for them, serialization, printing/etc.
Additionally the restrictions are all added as tests/todos in the tests,
as those will have to be implemented once we get those clauses implemented.
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.
This patch adds the semantic analysis for the constructs (but no
clauses), as well as the AST nodes.
The 'vector' clause specifies the iterations to be executed in vector or
SIMD mode. There are some limitations on which associated compute
contexts may be associated with this and have arguments, but otherwise
this is a fairly unrestricted clause.
It DOES have region limits like 'gang' and 'worker'.
The worker clause specifies iterations of the loop/ that are executed in
parallel by distributing the iterations among the multiple works within
a single gang.
The sema rules for this type are simply that it cannot be combined with
a `kernel` construct with a `num_workers` clause, child `loop` clauses
cannot contain a `gang` or `worker` clause, and that the argument is oly
allowed when associated with a `kernel`.
The 'gang' clause is used to specify parallel execution of loops, thus
has some complicated rules depending on the 'loop's associated compute
construct. This patch implements all of those.
Add the permutation clause for the interchange directive which will be
introduced in the upcoming OpenMP 6.0 specification. A preview has been
published in
[Technical Report12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).
The 'tile' clause shares quite a bit of the rules with 'collapse', so a
followup patch will add those tests/behaviors. This patch deals with
adding the AST node.
The 'tile' clause takes a series of integer constant expressions, or *.
The asterisk is now represented by a new OpenACCAsteriskSizeExpr node,
else this clause is very similar to others.
The 'collapse' clause on a 'loop' construct is used to specify how many
nested loops are associated with the 'loop' construct. It takes an
optional 'force' tag, and an integer constant expression as arguments.
There are many other restrictions based on the contents of the loop/etc,
but those are implemented in followup patches, for now, this patch just
adds the AST node and does basic argument checking on the loop-count.
HLSL output parameters are denoted with the `inout` and `out` keywords
in the function declaration. When an argument to an output parameter is
constructed a temporary value is constructed for the argument.
For `inout` pamameters the argument is initialized via copy-initialization
from the argument lvalue expression to the parameter type. For `out`
parameters the argument is not initialized before the call.
In both cases on return of the function the temporary value is written
back to the argument lvalue expression through an implicit assignment
binary operator with casting as required.
This change introduces a new HLSLOutArgExpr ast node which represents
the output argument behavior. The OutArgExpr has three defined children:
- An OpaqueValueExpr of the argument lvalue expression.
- An OpaqueValueExpr of the copy-initialized parameter.
- A BinaryOpExpr assigning the first with the value of the second.
Fixes#87526
---------
Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
Co-authored-by: John McCall <rjmccall@gmail.com>
By the OpenMP standard, `num_teams` clause can only accept one
expression (for now). In this patch, we extend it to allow to accept
multiple expressions when it is used with `target teams ompx_bare`
construct. This will allow to launch a multi-dim grid, same as CUDA/HIP.
This is a minimal patch to support parsing for "omp assume" directives.
These are meant to be hints to a compiler's optimisers: as such, it is
legitimate (if not very useful) to ignore them. The patch builds on top
of the existing support for "omp assumes" directives (note spelling!).
Unlike the "omp [begin/end] assumes" directives, "omp assume" is
associated with a compound statement, i.e. it can appear within a
function. The "holds" assumption could (theoretically) be mapped onto
the existing builtin "__builtin_assume", though the latter applies to a
single point in the program, and the former to a range (i.e. the whole
of the associated compound statement).
This patch fixes sollve's OpenMP 5.1 "omp assume"-based tests.
Add the reverse directive which will be introduced in the upcoming
OpenMP 6.0 specification. A preview has been published in [Technical
Report 12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).
---------
Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
This commit implements the entirety of the now-accepted [N3017
-Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded by
AST consumers (CodeGen or constant expression evaluators) or expand
embed directive as a comma expression.
This reverts commit
682d461d5a.
---------
Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
This commit implements the entirety of the now-accepted [N3017 -
Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded
by AST consumers (CodeGen or constant expression evaluators) or
expand embed directive as a comma expression.
---------
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Co-authored-by: Podchishchaeva, Mariya <mariya.podchishchaeva@intel.com>
These three clauses are all quite trivial, as they take no parameters.
They are mutually exclusive, and 'seq' has some other exclusives that
are implemented here.
The ONE thing that isn't implemented is 2.9's restriction (line 2010):
'A loop associated with a 'loop' construct that does not have a 'seq'
clause must be written to meet all the following conditions'.
Future clauses will require similar work, so it'll be done as a
followup.
This patch implements the 'loop' construct AST, as well as the basic
appertainment rule. Additionally, it sets up the 'parent' compute
construct, which is necessary for codegen/other diagnostics.
A 'loop' can apply to a for or range-for loop, otherwise it has no other
restrictions (though some of its clauses do).
'reduction' has a few restrictions over normal 'var-list' clauses:
1- On parallel, a num_gangs can only have 1 argument when combined with
reduction. These two aren't able to be combined on any other of the
compute constructs however.
2- The vars all must be 'numerical data types' types of some sort, or a
'composite of numerical data types'. A list of types is given in the
standard as a minimum, so we choose 'isScalar', which covers all of
these types and keeps types that are actually numeric. Other compilers
don't seem to implement the 'composite of numerical data types', though
we do.
3- Because of the above restrictions, member-of-composite is not
allowed, so any access via a memberexpr is disallowed. Array-element and
sub-arrays (aka array sections) are both permitted, so long as they meet
the requirements of #2.
This patch implements all of these for compute constructs.
device_type, also spelled as dtype, specifies the applicability of the
clauses following it, and takes a series of identifiers representing the
architectures it applies to. As we don't have a source for the valid
architectures yet, this patch just accepts all.
Semantically, this also limits the list of clauses that can be applied
after the device_type, so this implements that as well.
This reverts commit 06f04b2e27f2586d3db2204ed4e54f8b78fea74e.
This reapplies commit c4a9a374749deb5f2a932a7d4ef9321be1b2ae5d.
The build failures were caused by the patch depending on the order of
evaluation of arguments to a function. This reapplication separates out
the capture of one of the values.
This reverts commit c4a9a374749deb5f2a932a7d4ef9321be1b2ae5d.
This and the followup patch keep hitting an assert I wrote on the build
bots in a way that isn't clear. Reverting so I can fix it without a
rush.
device_type, also spelled as dtype, specifies the applicability of the
clauses following it, and takes a series of identifiers representing the
architectures it applies to. As we don't have a source for the valid
architectures yet, this patch just accepts all.
Semantically, this also limits the list of clauses that can be applied
after the device_type, so this implements that as well.
'wait' takes a few int-exprs (well, a series of async-arguments, but
those are effectively just an int-expr), plus a pair of tags. This
patch adds the support for this to the AST, and does the appropriate
semantic analysis for them.
This is a pretty simple clause, it takes an 'async-argument', which
effectively needs to be just parsed as an 'int' argument, since it can
be an arbitrarly integer at runtime (and negative values are legal for
implementation defined values).
This patch also cleans up the async-argument parsing, so 'wait' got some
minor quality-of-life improvements for parsing (both clause and
construct).
These two are very similar to the other 'var-list' variants, except they
require that the type of the variable be a pointer. This patch
implements that restriction.
Like 'copy', these also have alternate names, so this implements that as
well. Additionally, these have an optional tag of either 'readonly' or
'zero' depending on the clause.
Otherwise, this is a pretty rote implementation of the clause, as there
aren't any special rules for it.
Like present, no_create, and first_private, copy is a clause that takes
just a var-list, and follows the same rules as the others.
The one unique part of this clause is that it ALSO supports two
deprecated/backwards-compatibility spellings, so this patch adds them
and implements them.
The private clause is the first that takes a 'var-list', thus this has a
lot of additional work to enable the var-list type. A 'var' is a
traditional variable reference, subscript, member-expression, or
array-section, so checking of these is pretty minor.
Note: This ran into some issues with array-sections (aka sub-arrays)
that will be fixed in a follow-up patch.
OpenACC is going to need an array sections implementation that is a
simpler version/more restrictive version of the OpenMP version.
This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`,
then adds an enum to choose between the two.
This also fixes a couple of 'drive-by' issues that I discovered on the way,
but leaves the OpenACC Sema parts reasonably unimplemented (no semantic
analysis implementation), as that will be a followup patch.
num_gangs takes an 'int-expr-list', for 'parallel', and an 'int-expr'
for 'kernels'. This patch changes the parsing to always parse it as an
'int-expr-list', then correct the expression count during Sema. It also
implements the rest of the semantic analysis changes for this clause.
The 'vector_length' clause is semantically identical to the
'num_workers' clause, in that it takes a mandatory single int-expr. This
is implemented identically to it.
`self` clauses on compute constructs take an optional condition
expression. We again limit the implementation to ONLY compute constructs
to ensure we get all the rules correct for others. However, this one
will be particularly complicated, as it takes a `var-list` for `update`,
so when we get to that construct/clause combination, we need to do that
as well.
This patch also furthers uses of the `OpenACCClauses.def` as it became
useful while implementing this (as well as some other minor refactors as
I went through).
Finally, `self` and `if` clauses have an interaction with each other, if
an `if` clause evaluates to `true`, the `self` clause has no effect.
While this is intended and can be used 'meaningfully', we are warning on
this with a very granular warning, so that this edge case will be
noticed by newer users, but can be disabled trivially.