A kernel implicit parameter (dyn_ptr) was introduced some time back.
This patch increments the kernel args version for a compiler supporting
dyn_ptr. The version will be used by the runtime to determine whether
the implicit parameter is generated by the compiler. The versioning is
required to support use cases where code generated by an older compiler
is linked with a newer runtime.
If approved, this patch should be backported to release 18.
Summary:
Currently, OpenMP handles the `omp requires` clause by emitting a global
constructor into the runtime for every translation unit that requires
it. However, this is not a great solution because it prevents us from
having a defined order in which the runtime is accessed and used.
This patch changes the approach to no longer use global constructors,
but to instead group the flag with the other offloading entires that we
already handle. This has the effect of still registering each flag per
requires TU, but now we have a single constructor that handles
everything.
This function removes support for the old `__tgt_register_requires` and
replaces it with a warning message. We just had a recent release, and
the OpenMP policy for the past four releases since we switched to LLVM
is that we do not provide strict backwards compatibility between major
LLVM releases now that the library is versioned. This means that a user
will need to recompile if they have an old binary that relied on
`register_requires` having the old behavior. It is important that we
actively deprecate this, as otherwise it would not solve the problem of
having no defined init and shutdown order for `libomptarget`. The
problem of `libomptarget` not having a define init and shutdown order
cascades into a lot of other issues so I have a strong incentive to be
rid of it.
It is worth noting that the current `__tgt_offload_entry` only has space
for a 32-bit integer here. I am planning to overhaul these at some point
as well.
If an inlined kernel is called in a loop, the launch point alloca would
lead to increasing stack usage every time the kernel is invoked. This
could make the application run out of stack space and crash. This problem
is fixed by using the alloca insertion point while creating the alloca instruction.
Fixes https://github.com/llvm/llvm-project/issues/60602
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D145820
This reverts commit 8cf85a0cadb033fed3d96aa5283deb4bfbbaf2c8.
This is add back change of "Add map info for dereference pointer."
In addition turn off test run on amdgpu, since I don't know the way to
reprodue the problem.
This is to fix run time problem when use:
int **a;
map((*a)[:3]), (*a)[1] or map(**a).
current we skip generate map info for dereference pointer:
&(*a), &(*a)[0], 3*sizeof(int), TARGET_PARAM | TO | FROM
One way to fix runtime problem is to generate map info for dereference
pointer.
map((*a)[:3]):
&(*a), &(*a), sizeof(pointer), TARGET_PARAM | TO | FROM
&(*a), &(*a)[0], 3*sizeof(int), PTR_AND_OBJ | TO | FROM
map(**a):
&(*a), &(*a), sizeof(pointer), TARGET_PARAM | TO | FROM
&(*a), &(**a), sizeof(int), PTR_AND_OBJ | TO | FROM
The change in CGOpenMPRuntime.cpp add that.
The change in SemaOpenMP is to fix variable of dereference pointer to array
captured by reference. That is wrong. That cause run time to fail.
The rule is:
If variable is identified in a map clause it is always captured by
reference except if it is a pointer that is dereferenced somehow.
Differential Revision: https://reviews.llvm.org/D145093