15 Commits

Author SHA1 Message Date
Dominik Adamski
18798cf972
[OpenMP] Add missing weak definitions of missing variables (#77767)
Variables `__omp_rtl_assume_teams_oversubscription` and
`__omp_rtl_assume_threads_oversubscription `are used by functions:
`__kmpc_distribute_static_loop`, `__kmpc_distribute_for_static_loop `and
`__kmpc_for_static_loop`.
2024-01-11 15:28:45 +01:00
Johannes Doerfert
7233e42dff
[OpenMP][NFC] Move Environment.h and SourceInfo.h into "Shared" folder (#73703) 2023-11-28 15:10:06 -08:00
Johannes Doerfert
1cea309b7e [OpenMP][NFC] Move DebugKind to make it reusable from the host 2023-10-20 19:28:09 -07:00
Joseph Huber
460840c09d
[OpenMP] Support 'omp_get_num_procs' on the device (#65501)
Summary:
The `omp_get_num_procs()` function should return the amount of
parallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.
2023-09-06 13:45:05 -05:00
Joseph Huber
aa78e94b0b [Libomptarget] Support mapping indirect host calls to device functions
The changes in D157738 allowed for us to emit stub globals on the device
in the offloading entry section. These globals contain the addresses of
device functions and allow us to map host functions to their
corresponding device equivalent. This patch provides the initial support
required to build a table on the device to lookup the associated value.
This is done by finding these entries and creating a global table on the
device that can be searched with a simple binary search.

This requires an allocation, which supposedly should be automatically
freed at plugin shutdown. This includes a basic test which looks up device
pointers via a host pointer using the added function. This will need to be built
upon to provide full support for these calls in the runtime.

To support reverse offloading it would also be useful to provide a reverse table
that allows us to get host functions from device stubs.

Depends on D157738

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D157918
2023-08-25 18:51:56 -05:00
Joseph Huber
6764301a6b [Libomptarget] Correctly implement getWTime on AMDGPU
AMDGPU provides a fixed frequency clock since some generations back.
However, the frequency is variable by card and must be looked up at
runtime. This patch adds a new device environment line for the clock
frequency so that we can use it in the same way as NVPTX. This is the
correct implementation and the version in ASO should be replaced.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D154456
2023-07-04 21:50:43 -05:00
Johannes Doerfert
2b5a99b3d9 [OpenMP] Rename the _OMP namespace in the device runtime to ompx
Differential Revision: https://reviews.llvm.org/D140334
2022-12-19 14:43:59 -08:00
Joseph Huber
2b8f722e63 [OpenMP] Add option to assert no nested OpenMP parallelism on the GPU
The OpenMP device runtime needs to support the OpenMP standard. However
constructs like nested parallelism are very uncommon in real application
yet lead to complexity in the runtime that is sometimes difficult to
optimize out. As a stop-gap for performance we should supply an argument
that selectively disables this feature. This patch adds the
`-fopenmp-assume-no-nested-parallelism` argument which explicitly
disables the usee of nested parallelism in OpenMP.

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D132074
2022-08-23 14:09:51 -05:00
Jose M Monsalve Diaz
616dd9ae14 [OpenMP] Implementing omp_get_device_num()
This patch implements omp_get_device_num() in the host and the device.

It uses the already existing getDeviceNum in the device config for the device.
And in the host it uses the omp_get_num_devices().

Two simple tests added

Differential Revision: https://reviews.llvm.org/D128347
2022-06-29 02:18:21 -05:00
Joseph Huber
0870a4f59a [OpenMP] Add flag for disabling thread state in runtime
The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it
not needed in the majority of cases. The new flag added is
`-fopenmp-assume-no-thread-state`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120106
2022-02-18 08:35:05 -05:00
Joseph Huber
6dd791bca8 [OpenMP] Check output of malloc in the device for debug
A common problem is the device running out of global heap memory and
crashing due to a nullptr dereference when using the data sharing stack.
This explicitly checks that a nullptr was not returned by malloc when
debugging field 1 is enabled.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112005
2021-10-29 14:57:12 -04:00
Joseph Huber
277b681ede [OpenMP] Add function tracing debugging to device RTL
This patch adds support for an RAII struct that will print function
traces when placed inside of a function declaration. Each successive
call will increase the indentation to make it easier to visually
inspect.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110202
2021-09-22 12:25:29 -04:00
Joseph Huber
f1c821fa85 [OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return a
pointer to the buffer of dynamic shared memory. Currently the amount of memory
allocated is set by an environment variable.

In the future this amount will be added to the amount used for the smart stack
which will be configured in a similar way.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110006
2021-09-17 21:25:36 -04:00
Joseph Huber
ec02c34b6d [OpenMP] Add additional fields to device environment
This patch adds fields for the device number and number of devices into
the device environment struct and debugging values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110004
2021-09-17 21:25:32 -04:00
Johannes Doerfert
67ab875ff5 [OpenMP] Prototype opt-in new GPU device RTL
The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an alternative which is mostly written from scratch embracing
OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
interfaces. This new runtime is opt-in through a clang flag (D106793).
The new runtime is currently only build for nvptx and has "-new" in its
name.

The design is tailored towards middle-end optimizations rather than
front-end code generation choices, a trend we already started in the old
runtime a while back. In contrast to the old one, state is organized in
a simple manner rather than a "smart" one. While this can induce costs
it helps optimizations. Our expectation is that the majority of codes
can be optimized and a "simple" design is therefore preferable. The new
runtime does also avoid users to pay for things they do not use,
especially wrt. memory. The unlikely case of nested parallelism is
supported but costly to make the more likely case use less resources.

The worksharing and reduction implementation have been taken from the
old runtime and will be rewritten in the future if necessary.

Documentation and debug features are still mostly missing and will be
added over time.

All external symbols start with `__kmpc` for legacy reasons but should
be renamed once we switch over to a single runtime. All internal symbols
are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
clashes with user symbols.

Differential Revision: https://reviews.llvm.org/D106803
2021-07-27 00:56:05 -05:00