70 Commits

Author SHA1 Message Date
Johannes Doerfert
b316126887 [OpenMP][FIX] Avoid races in the handling of to be deleted mapping entries
If we decided to delete a mapping entry we did not act on it right away
but first issued and waited for memory copies. In the meantime some
other thread might reuse the entry. While there was some logic to avoid
colliding on the actual "deletion" part, there were two races happening:

1) The data transfer back of the thread deleting the entry and
   the data transfer back of the thread taking over the entry raced.
2) The update to the shadow map happened regardless if the entry was
   actually reused by another thread which left the shadow map in a
   inconsistent state.

To fix both issues we will now update the shadow map and delete the
entry only if we are sure the thread is responsible for deletion, hence
no other thread took over the entry and reused it. We also wait for a
potential former data transfer from the device to finish before we issue
another one that would race with it.

Fixes https://github.com/llvm/llvm-project/issues/54216

Differential Revision: https://reviews.llvm.org/D121058
2022-03-28 22:33:18 -05:00
Johannes Doerfert
7dfad948f1 [OpenMP][FIX] Repair ExclusiveAccess move semantic snafu 2022-03-25 16:00:53 -05:00
Johannes Doerfert
4e34f061d6 [OpenMP][FIX] Ensure exclusive access to the HDTT map
This patch solves two problems with the `HostDataToTargetMap` (HDTT
map) which caused races and crashes before:

1) Any access to the HDTT map needs to be exclusive access. This was not
   the case for the "dump table" traversals that could collide with
   updates by other threads. The new `Accessor` and `ProtectedObject`
   wrappers will ensure we have a hard time introducing similar races in
   the future. Note that we could allow multiple concurrent
   read-accesses but that feature can be added to the `Accessor` API
   later.
2) The elements of the HDTT map were `HostDataToTargetTy` objects which
   meant that they could be copied/moved/deleted as the map was changed.
   However, we sometimes kept pointers to these elements around after we
   gave up the map lock which caused potential races again. The new
   indirection through `HostDataToTargetMapKeyTy` will allows us to
   modify the map while keeping the (interesting part of the) entries
   valid. To offset potential cost we duplicate the ordering key of the
   entry which avoids an additional indirect lookup.

We should replace more objects with "protected objects" as we go.

Differential Revision: https://reviews.llvm.org/D121057
2022-03-25 11:38:54 -05:00
Jon Chesterfield
75779435f3 [nfc][openmp] Swap arguments to remove noise from upcoming diff 2022-03-11 23:08:37 +00:00
Johannes Doerfert
10aa83ff74 [OpenMP] Allow to explicitly deinitialize device resources
There are two problems this patch tries to address:
1) We currently free resources in a random order wrt. plugin and
   libomptarget destruction. This patch should ensure the CUDA plugin
   is less fragile if something during the deinitialization goes wrong.
2) We need to support (hard) pause runtime calls eventually. This patch
   allows us to free all associated resources, though we cannot
   reinitialize the device yet.

Follow up patch will associate one event pool per device/context.

Differential Revision: https://reviews.llvm.org/D120089
2022-03-07 23:43:04 -06:00
Johannes Doerfert
307bbd3c82 [OpenMP][NFCI] Use RAII lock guards in libomptarget where possible
Differential Revision: https://reviews.llvm.org/D121060
2022-03-07 23:43:04 -06:00
Johannes Doerfert
7ead7e90fc Revert "[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible"
This reverts commit ff50e81b500800708db927cbccca2ab52ec11884 as it broke
the buildbots, see https://reviews.llvm.org/D121060#3362737.
2022-03-06 21:27:41 -06:00
Johannes Doerfert
ff50e81b50 [OpenMP][NFCI] Use RAII lock guards in libomptarget where possible
Differential Revision: https://reviews.llvm.org/D121060
2022-03-06 19:59:23 -06:00
Sri Hari Krishna Narayanan
f44e41af41 Runtime for Interop directive
This implements the runtime portion of the interop directive.
It expects the frontend and IRBuilder portions to be in place
for proper execution. It currently works only for GPUs
and has several TODOs that should be addressed going forward.

Reviewed By: RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D106674
2022-01-27 15:16:24 -05:00
Johannes Doerfert
b0789a1b12 [OpenMP] Avoid costly shadow map traversals whenever possible
In the OpenMC app we saw `omp target update` spending an awful lot of
time in the shadow map traversal without ever doing any update there.
There are two cases that allow us to avoid the traversal completely.
The simplest thing is that small updates cannot (reasonably) contain
an attached pointer part. The other case requires to track in the
mapping table if an entry might contain an attached pointer as part.
Given that we have a single location shadow map entries are created,
the latter is actually fairly easy as well.

Differential Revision: https://reviews.llvm.org/D113124
2022-01-19 22:14:41 -06:00
Johannes Doerfert
1e447d03e2 [OpenMP] Introduce an environment variable to disable atomic map clauses
Atomic handling of map clauses was introduced to comply with the OpenMP
standard (see D104418). However, many apps won't need this feature which
can be costly in certain situations. To allow for applications to
opt-out we now introduce the `LIBOMPTARGET_MAP_FORCE_ATOMIC` environment
flag that voids the atomicity guarantee of the standard for map clauses
again, shifting the burden to the user.

This patch also de-duplicates the code that introduces the events used
to enforce atomicity as a cleanup.

Differential Revision: https://reviews.llvm.org/D117627
2022-01-19 22:14:41 -06:00
Shilei Tian
9584c6fa2f [OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement
The async data movement can cause data race if the target supports it.
Details can be found in [1]. This patch tries to fix this problem by attaching
an event to the entry of data mapping table. Here are the details.

For each issued data movement, a new event is generated and returned to `libomptarget`
by calling `createEvent`. The event will be attached to the corresponding mapping table
entry.

For each data mapping lookup, if there is no need for a data movement, the
attached event has to be inserted into the queue to gaurantee that all following
operations in the queue can only be executed if the event is fulfilled.

This design is to avoid synchronization on host side.

Note that we are using CUDA terminolofy here. Similar mechanism is assumped to
be supported by another targets. Even if the target doesn't support it, it can
be easily implemented in the following fall back way:
- `Event` can be any kind of flag that has at least two status, 0 and 1.
- `waitEvent` can directly busy loop if `Event` is still 0.

My local test shows that `bug49334.cpp` can pass.

Reference:
[1] https://bugs.llvm.org/show_bug.cgi?id=49940

Reviewed By: grokos, JonChesterfield, ye-luo

Differential Revision: https://reviews.llvm.org/D104418
2022-01-05 20:20:04 -05:00
Johannes Doerfert
73104ad65b [OpenMP][NFC] Move headers into include folder 2021-12-28 23:53:28 -06:00
Jon Chesterfield
38af5b4fd1 [libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches 2021-12-17 22:28:31 +00:00
Joseph Huber
208f900527 [Libomptarget] Add an external interface to dynamic shared memory
This patch adds an external interface to access the dynamic shared
memory buffer in the device runtime. The function introduced is
``llvm_omp_get_dynamic_shared``. This includes a host-side
definition that only returns a null pointer so that it can be used when
host-fallback is enabled without crashing. Support for dynamic shared
memory was also ported to the old device runtime.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110957
2021-10-08 15:36:57 -04:00
Jon Chesterfield
0c554a4769 [libomptarget] Move device environment to shared header, remove divergence
Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069
2021-10-07 12:03:48 +01:00
Ye Luo
2187cbf56f [OpenMP][libomptarget] Add __tgt_target_return_t enum for __tgt_target_XXX return int
The defintion of OFFLOAD_SUCCESS and OFFLOAD_FAIL used in plugin APIs and libomptarget public APIs are not consistent.
Create __tgt_target_return_t for libomptarget public APIs.

Differential Revision: https://reviews.llvm.org/D109304
2021-09-10 16:11:08 -05:00
Joel E. Denny
ec1ebcd302 [OpenMP][OpenACC] Implement ompx_hold map type modifier extension in runtime (2/2)
This patch implements OpenMP runtime support for an original OpenMP
extension we have developed to support OpenACC: the `ompx_hold` map
type modifier.  The previous patch in this series, D106509, implements
Clang support and documents the new functionality in detail.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D106510
2021-08-31 16:13:49 -04:00
Shilei Tian
29df4ab3f3 [OpenMP][Offloading] Add support for event related interfaces
This patch adds the support form event related interfaces, which will be used
later to fix data race. See D104418 for more details.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D108528
2021-08-28 16:24:14 -04:00
Jose M Monsalve Diaz
313c523995 [OpenMP][Tool] Introducing the llvm-omp-device-info tool
This patch introduces the `llvm-omp-device-info` tool, which uses the
omptarget library and interface to query the device info from all the
available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo`

Since omptarget usually requires a description structure with executable
kernels, I split the initialization of the RTLs and Devices to be able to
initialize all possible devices and query each of them.

This revision relies on the patch that introduces the print device info.

A limitation is that the order in which the devices are initialized, and the
corresponding device ID is not necesarily the one seen by OpenMP.

The changes are as follows:
1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function
2. Create an `initAllRTLs` method that initializes all available RTLs at runtime
3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information.

Example Output:
```
Device (0):
    print_device_info not implemented

Device (1):
    print_device_info not implemented

Device (2):
    print_device_info not implemented

Device (3):
    print_device_info not implemented

Device (4):
    CUDA Driver Version:                11000
    CUDA Device Number:                 0
    Device Name:                        Quadro P1000
    Global Memory Size:                 4236312576 bytes
    Number of Multiprocessors:          5
    Concurrent Copy and Execution:      Yes
    Total Constant Memory:              65536 bytes
    Max Shared Memory per Block:        49152 bytes
    Registers per Block:                65536
    Warp Size:                          32 Threads
    Maximum Threads per Block:          1024
    Maximum Block Dimensions:           1024, 1024, 64
    Maximum Grid Dimensions:            2147483647 x 65535 x 65535
    Maximum Memory Pitch:               2147483647 bytes
    Texture Alignment:                  512 bytes
    Clock Rate:                         1480500 kHz
    Execution Timeout:                  Yes
    Integrated Device:                  No
    Can Map Host Memory:                Yes
    Compute Mode:                       DEFAULT
    Concurrent Kernels:                 Yes
    ECC Enabled:                        No
    Memory Clock Rate:                  2505000 kHz
    Memory Bus Width:                   128 bits
    L2 Cache Size:                      1048576 bytes
    Max Threads Per SMP:                2048
    Async Engines:                      Yes (2)
    Unified Addressing:                 Yes
    Managed Memory:                     Yes
    Concurrent Managed Memory:          Yes
    Preemption Supported:               Yes
    Cooperative Launch:                 Yes
    Multi-Device Boars:                 No
    Compute Capabilities:               61
```

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106752
2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz
d2f85d0910 [OpenMP][Libomptarget] Adding print_device_info to RTL and omptarget
This patch introduces a function in the device's plugin to print the
device information. This patch relates to another patch that introduces
a CLI tool to obtain the device information from the omplibrary directly.
It is inspired by PGI's pgaccelinfo.

The modifications are as follows:
1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL.
2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented
3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy`
4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106751
2021-07-27 21:47:57 -04:00
Jon Chesterfield
6e9cd3e9f1 [libomptarget][nfc] Improve static assert message in dlwrap
Revision of D102858. Raise dlwrap arity argument to template argument
so the correct value is given in the error message. E.g. '2 == 1' instead of
'2 == trait<>::nargs'.

Arity higher than it should be:
Before diff
```
$/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: error:
      static_assert failed due to requirement '2 == trait<cudaError_enum (*)(unsigned int)>::nargs'
      "Arity Error"
DLWRAP_INTERNAL(cuInit, 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~
...
$/include/dlwrap.h:166:3: note: expanded from macro
      'DLWRAP_COMMON'
  static_assert(ARITY == trait<decltype(&SYMBOL)>::nargs, "Arity Error");      \
```

After diff
In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16:
```
$/include/dlwrap.h:131:3: error: static_assert failed due to
      requirement '2UL == 1UL' "Arity Error"
  static_assert(Requested == Required, "Arity Error");
  ^             ~~~~~~~~~~~~~~~~~~~~~
$/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in
      instantiation of function template specialization 'dlwrap::verboseAssert<2UL, 1UL>' requested
      here
DLWRAP_INTERNAL(cuInit, 2);
```

Arity lower than it should be:
Before diff
```
$/plugins/cuda/dynamic_cuda/cuda.cpp:131:10: error: no
      matching function for call to 'dlwrap_cuInit'
  return dlwrap_cuInit(X);
         ^~~~~~~~~~~~~
$/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: candidate
      function not viable: requires 0 arguments, but 1 was provided
DLWRAP_INTERNAL(cuInit, 0);
```

After diff
In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16:
```
$/include/dlwrap.h:131:3: error: static_assert failed due to
      requirement '0UL == 1UL' "Arity Error"
  static_assert(Requested == Required, "Arity Error");
  ^             ~~~~~~~~~~~~~~~~~~~~~
$/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in
      instantiation of function template specialization 'dlwrap::verboseAssert<0UL, 1UL>' requested
      here
DLWRAP_INTERNAL(cuInit, 0);
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106543
2021-07-22 15:24:20 +01:00
Joseph Huber
df965513a9 [OpenMP] Add an information flag for device data transfers
This patch adds an information flag that indicated when data is being copied to
and from the device. This will be helpful for finding redundant or unnecessary
data transfers in applications.

Reviewed By: jdoerfert, grokos

Differential Revision: https://reviews.llvm.org/D103927
2021-06-08 20:23:27 -04:00
Jon Chesterfield
68b88ae670 [libomptarget] Improve dlwrap compile time error diagnostic
[libomptarget] Improve dlwrap compile time error diagnostic

The dlwrap interface takes an explict arity, e.g. DLWRAP(cuAlloc, 2);
This probably can't be eliminated as it controls the argument list of an
external symbol, not an inline header function. If the arity given is too
big, the error from clang referring to the line is in the middle of
implementation details.

/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1277:7: error: static_assert failed
      due to requirement '0UL < tuple_size<std::tuple<>>::value' "tuple index is in range"
      static_assert(__i < tuple_size<tuple<>>::value,
      ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ...
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ...
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:93:27 ...

/home/amd/llvm-project/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp:34:1: note: in
      instantiation of template class 'dlwrap::trait<cudaError_enum (*)(unsigned long *, unsigned
      long)>::arg<2>' requested here
DLWRAP(cuMemAlloc, 3);
^
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:51:31: ...
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:166:3: ...
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:133:3: ...
/home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:186:37: ...

If the arity is too small, the diagnostic is better:

cuda/dynamic_cuda/cuda.cpp:34:1: error: too few
      arguments to function call, expected 2, have 1
DLWRAP(cuMemAlloc, 1);

This patch changes the diagnostic to:

cuda/dynamic_cuda/cuda.cpp:34:1: error:
      static_assert failed due to requirement '1 == trait<cudaError_enum (*)(unsigned long *, unsigned
      long)>::nargs' "Arity Error"
DLWRAP(cuMemAlloc, 1);

or

cuda/dynamic_cuda/cuda.cpp:34:1: error:
      static_assert failed due to requirement '3 == trait<cudaError_enum (*)(unsigned long *, unsigned
      long)>::nargs' "Arity Error"
DLWRAP(cuMemAlloc, 3);

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102858
2021-05-20 20:33:36 +01:00
Joseph Huber
59b6849012 [OpenMP] Replace global InfoLevel with a reference to an internal one.
Summary:
This patch improves the implementation of D100774 by replacing the global
variable introduced with a function that returns a reference to an internal
one. This removes the need to define the variable in every plugin that uses it.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D101102
2021-04-23 09:43:46 -04:00
Joseph Huber
2b6f20082e [OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime
Summary:
This patch adds a new runtime function __tgt_set_info_flag that allows the
user to set the information level at runtime without using the environment
variable. Using this will require an extern function, but will eventually be
added into an auxilliary library for OpenMP support functions.

This patch required moving the current InfoLevel to a global variable which must
be instantiated by each plugin.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100774
2021-04-22 12:48:11 -04:00
Hansang Bae
9b98497b44 [OpenMP] Add omp_target_is_accessible() to header files
-- Added omp_target_is_accessible to the header files
-- Added missing const qualifier to device memory routines

Differential Revision: https://reviews.llvm.org/D100420
2021-04-16 07:54:15 -05:00
Joseph Huber
83d4b2e2e0 [OpenMP] Add info for device table changes
Summary:
This patch adds a feature to print information whenever the host-device pointer
mapping table is changed by inserting or removing an entry. This introduces a
new bit field for LIBOMPTARGET_INFO at position 0x8.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100600
2021-04-15 18:39:48 -04:00
George Rokos
2468fdd9af [libomptarget] Add allocator support for target memory
This patch adds the infrastructure for allocator support for target memory.
Three allocators are introduced for device, host and shared memory.
The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard.

Differential Revision: https://reviews.llvm.org/D97883
2021-03-13 03:47:07 -08:00
Johannes Doerfert
5449fbb5d4 [OpenMP][NFC] Use AsyncInfo as the variable name for a __tgt_async_info
Reviewed By: grokos, tianshilei1992

Differential Revision: https://reviews.llvm.org/D96444
2021-03-11 23:31:34 -06:00
Joseph Huber
807466ef28 [OpenMP] Restore backwards compatibility for libomptarget
Summary:
The changes introduced in D87946 changed the API for libomptarget
functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x
but was not given a backward-compatible interface. This change will
require people using Clang 13.x or 12.x to recompile their offloading
programs.

Reviewed By: jdoerfert cchen

Differential Revision: https://reviews.llvm.org/D98358
2021-03-11 09:52:11 -05:00
Vyacheslav Zakharin
6baeeb9efa [libomptarget] Fixed MSVC build fail caused by __attribute__((used)).
Differential Revision: https://reviews.llvm.org/D97348
2021-02-24 09:59:39 -08:00
Manoel Roemmer
542d9c2154 [libomptarget] Load images in order of registration
This makes sure that images are loaded in the order in which they are registered with libomptarget.

If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin).
In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95530
2021-02-24 18:15:41 +01:00
Johannes Doerfert
2518cc65d2 [OpenMP][FIX] Avoid use of stack allocations in asynchronous calls
As reported by Guilherme Valarini [0], we used to pass stack allocations
to calls that can nowadays be asynchronous. This is arguably a problem
and it will inevitably result in UB. To remedy the situation we
allocate the locations as part of the AsyncInfoTy object. The lifetime
of that object matches what we need for now. If the synchronization is
not tied to the AsyncInfoTy object anymore we might need to have a
different buffer construct in global space.

This should be back-ported to LLVM 12 but needs slight modifications as
it is based on refactoring patches we do not need to backport.

[0] https://lists.llvm.org/pipermail/openmp-dev/2021-February/003867.html

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D96667
2021-02-16 15:38:11 -06:00
Johannes Doerfert
a2fc0d34db [OpenMP] Move synchronization into __tgt_async_info
The AsyncInfo should be passed everywhere and it should offer a way to
ensure synchronization, given a libomptarget Device.

This replaces D96431.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D96438
2021-02-16 15:38:01 -06:00
Johannes Doerfert
9cd1e2228c [OpenMP][NFC] Clang format libomptarget code (src & include)
The struct and enum alignments are kept by disabling clang-format for
that code region.

Reviewed By: tianshilei1992, JonChesterfield, grokos

Differential Revision: https://reviews.llvm.org/D96428
2021-02-16 15:37:35 -06:00
Vyacheslav Zakharin
3caa2d3354 [libomptarget][NFC] Avoid gcc 5/6 issue with lambda captures.
Differential Revision: https://reviews.llvm.org/D95486
2021-01-26 16:06:58 -08:00
Johannes Doerfert
8c7fdc4c61 [OpenMP] Add source location information to the libomptarget profile
In much of the libomptarget interface we have an ident_t object now, if
it is not null we can use it to improve the profile output. For now, we
simply use the ident_t "source information string" as generated by the
FE.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95282
2021-01-25 22:43:43 -06:00
Joseph Huber
93eef7d8e9 [OpenMP][NFC] Fix SourceInfo.h variable names
Summary:
Fix the names to use Pascal case to comply with the LLVM coding guidelines. `ident_t` is required for compatibility with the rest of libomp.
2021-01-25 12:43:34 -05:00
Jon Chesterfield
47e95e87a3 [libomptarget] Build cuda plugin without cuda installed locally
[libomptarget] Build cuda plugin without cuda installed locally

Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used.

This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp.

The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95155
2021-01-23 00:15:04 +00:00
Joseph Huber
119a9ea13f [OpenMP] Fix failing test due to change in offloading flags
Summary:
Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95133
2021-01-21 14:09:36 -05:00
Joseph Huber
fe5d51a489 [OpenMP] Add using bit flags to select Libomptarget Information
Summary:
This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in.

Reviewers: jdoerfert JonChesterfield grokos

Differential Revision: https://reviews.llvm.org/D93727
2021-01-04 12:03:15 -05:00
cchen
7036fe8a0c [libomptarget] Add support for target update non-contiguous
This patch is the runtime support for https://reviews.llvm.org/D84192.

In order not to modify the tgt_target_data_update information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload arg when
the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for
passing the pointer information, however, the overloaded arg is an
array of descriptor_dim:

```
struct descriptor_dim {
  int64_t offset;
  int64_t count;
  int64_t stride
};
```

and the array size is the dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
arg_size parameter by using dimension size.

Reviewed By: grokos, tianshilei1992

Differential Revision: https://reviews.llvm.org/D82245
2020-11-19 11:33:27 -06:00
Joseph Huber
da8bec47ab [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.

Reviewers: jdoerfert grokos

Differential Revision: https://reviews.llvm.org/D87946
2020-11-19 12:01:53 -05:00
Joseph Huber
5378c6a4bf [OpenMP] Add Support for Mapping Names in Libomptarget RTL
Summary:
This patch adds basic support for priting the source location and names for the mapped variables. This patch does not support names for custom mappers. This is based on D89802.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D90172
2020-11-18 16:01:59 -05:00
Joseph Huber
97e55cfef5 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D89802
2020-11-18 15:28:39 -05:00
Alexey Bataev
dcde6f17fd Revert "[libomptarget] Add support for target update non-contiguous"
This reverts commit 6847bcec1aa9e262e2b175926d94a12fc1174c6d. It breaks
the build of libomptarget.
2020-11-10 07:49:00 -08:00
cchen
6847bcec1a [libomptarget] Add support for target update non-contiguous
This patch is the runtime support for https://reviews.llvm.org/D84192.

In order not to modify the tgt_target_data_update information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload arg when
the maptype is set as OMP_TGT_MAPTYPE_DESCRIPTOR. The origin arg is for
passing the pointer information, however, the overloaded arg is an
array of descriptor_dim:

```
struct descriptor_dim {
  int64_t offset;
  int64_t count;
  int64_t stride
};
```

and the array size is the dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
arg_size parameter by using dimension size.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D82245
2020-11-06 20:55:33 -06:00
Benjamin Kramer
207cf71fa9 Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API"
This reverts commit d981c7b7581efc3ef378709042100e75da0185a0 and
a87d7b3d448a16e416d1980b9d6aea99e4c9900b. Test fails under msan.
2020-10-28 13:58:14 +01:00
Joseph Huber
d981c7b758 [OpenMP] Add Support for Mapping Names in Libomptarget RTL
Summary:
This patch adds basic support for priting the source location and names for the
mapped variables. This patch does not support names for custom mappers. This is
based on D89802. The names information currently will be printed out only in
debug mode or using env LIBOMPTARGET_INFO during execution. But the information
is added when availible to the Device and Private data structures. To get the
information out the code must be built with debug symbols on using -g or
-Rpass=openmp-opt

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D90172
2020-10-27 16:53:05 -04:00