2955 Commits

Author SHA1 Message Date
Shilei Tian
600e0cde3e [NFC][OpenMP] clang-format openmp/libomptarget/src/interface.cpp 2023-09-08 23:01:39 -04:00
Brad Smith
7e31b45d6a
[OpenMP] Use the more appropriate function to retrieve the thread id on OpenBSD (#65553)
Use the getthrid() function instead of a syscall.
2023-09-07 21:05:25 -04:00
Shilei Tian
010a5a737b [OpenMP] Fix build issue with libomp when OMPT is disabled 2023-09-06 23:40:24 -04:00
Brad Smith
fd4c80dec9
[OpenMP] Fix gettid warnings on DragonFly (#65549)
Define __kmp_gettid() as appropriate for DragonFly.
2023-09-06 20:21:11 -04:00
Shilei Tian
99d67fb9aa
[OpenMP] Align up the size when calling aligned_alloc (#65525)
Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the
`size` is supposed
to be a multiple of `alignment`, and it is implementation defined
behavior if not.
We have a non-conformant use in `kmp_barrier.h` when allocating
distribute barrier.
The size of the barrier is 576 and the alignment is `4*CACHE_LINE`,
which is 256
on most systems. Apparently it works perfectly fine for Linux and
Intel-based Mac,
but not for Apple Silicon based Mac.

Fix #63194.
2023-09-06 16:28:07 -04:00
Joseph Huber
460840c09d
[OpenMP] Support 'omp_get_num_procs' on the device (#65501)
Summary:
The `omp_get_num_procs()` function should return the amount of
parallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.
2023-09-06 13:45:05 -05:00
Shilei Tian
ff5c7261ef [OpenMP] Fix a wrong assertion in __kmp_get_global_thread_id
The function assumes that `__kmp_gtid_get_specific` always returns a valid gtid.
That is not always true, because when creating the key for thread-specific data,
a destructor is assigned. The dtor will be called at thread exit. However, before
the dtor is called, the thread-specific data will be reset to NULL first
(https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html):

> At thread exit, if a key value has a non-NULL destructor pointer, and the thread
> has a non-NULL value associated with that key, the value of the key is set to NULL.

This will lead to that `__kmp_gtid_get_specific` returns `KMP_GTID_DNE`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D159369
2023-09-06 12:21:43 -04:00
Shilei Tian
518b08c193
[OpenMP] Fix issue of indirect function call in __kmpc_fork_call_if (#65436)
The outlined function is typically invoked by using
`__kmp_invoke_microtask`,
which is written in asm. D138495 introduces a new interface function for
parallel
region for OpenMPIRBuilder, where the outlined function is called via
the function
pointer. For some reason, it works perfectly well on x86 and x86-64
system, but
doesn't work on Apple Silicon. The 3rd argument in the callee is always
`nullptr`, even
if it is not in caller. It appears `x2` always contains `0x0`. This
patch adopts
the typical method to invoke the function pointer. It works on my M2
Ultra Mac.

Fix #63194.
2023-09-06 12:17:45 -04:00
Fangrui Song
678e3ee123 [lldb] Fix duplicate word typos; NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 21:32:24 -07:00
Ethan Luis McDonough
2b6ba8c735
[openmp] Tighten flang detection in offloading test
This patch ensures that the locally built version of flang when building in-tree.  `find_program` sometimes used the wrong executable if a different copy of flang was installed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D159161
2023-09-01 13:59:18 -05:00
Martin Storsjö
c2019c416c [OpenMP] [test] Fix target_thread_limit.cpp to not assume 4 or more cores
Previously, the test ran a section with

    #pragma omp target thread_limit(4)

and expected it to execute exactly 4 times, even though it would
in practice execute min(cores, 4) times.

Increment a counter and check that it executed 1-4 times.

Differential Revision: https://reviews.llvm.org/D159311
2023-09-01 21:16:58 +03:00
Jan Leyonberg
a0e3418bc8 [flang][OpenMP] Add fortran test with basic target region
This patch adds a test that uses a target region to set a scalar value. It also
adds rules in lit.cfg to handle fortran testing.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D159216
2023-09-01 09:26:36 -04:00
Shilei Tian
35fdf8d703 [OpenMP] Fix a segment fault in __kmp_get_global_thread_id
In `__kmp_get_global_thread_id`, if the gtid mode is 1, after getting the gtid
from TLS, it will store the gtid value to the thread stack maintained in the thread
descriptor. However, `__kmp_get_global_thread_id` can be called when the library
is destructed, after the corresponding thread info has been release. This will
cause a segment fault. This can happen on an Intel-based Mac.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D159324
2023-08-31 21:15:28 -04:00
Joseph Huber
ccb1d183c3 [OpenMP][Docs] Remove old entry saying static libraries are unsupported
Summary:
Static libraries have been supported since LLVM 15.0, this entry is
misleading and should be removed.
2023-08-30 06:48:57 -05:00
Ethan Luis McDonough
9e3d59e4c2
[openmp] Fix flang detection for offloading test
This patch fixes the flang detection in the openmp fortran offloading test.

Reviewed By: jsjodin

Differential Revision: https://reviews.llvm.org/D158546
2023-08-29 16:31:03 -05:00
Martin Storsjö
81ecc887aa [OpenMP] Export __kmpc_set_thread_limit on Windows
This fixes the new test target/target_thread_limit.cpp on
Windows, which was added recently in
08bbff4aad57c70a38d5d2680a61901977e66637 /
https://reviews.llvm.org/D152054.

Differential Revision: https://reviews.llvm.org/D159070
2023-08-29 23:22:21 +03:00
Saiyedul Islam
f616c3eeb4
[OpenMP][DeviceRTL][AMDGPU] Support code object version 5
Update DeviceRTL and the AMDGPU plugin to support code
object version 5. Default is code object version 4.

CodeGen for __builtin_amdgpu_workgroup_size generates code
for cov4 as well as cov5 if -mcode-object-version=none
is specified. DeviceRTL compilation passes this argument
via Xclang option to generate abi-agnostic code.

Generated code for the above builtin uses a clang
control constant "llvm.amdgcn.abi.version" to branch on
the abi version, which is available during linking of
user's OpenMP code. Load of this constant gets eliminated
during linking.

AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.

Differential Revision: https://reviews.llvm.org/D139730

Reviewed By: jhuber6, yaxunl
2023-08-29 06:35:44 -05:00
Anton Rydahl
c1b5674fbb [OpenMP] Change OpenMP default version in documentation and help text for -fopenmp-version
As discussed on the weekly OpenMP meeting on the second of August 2023, the default version
in the OpenMP documentation shoud be changed from OpenMP 5.0 to 5.1.

Differential Revision: https://reviews.llvm.org/D156901
2023-08-28 19:05:55 -07:00
Doru Bercea
2102ed0b91 Fix for openmp tests honoring thread_limit.
Diff: https://reviews.llvm.org/D159001
2023-08-28 13:04:17 -04:00
Doru Bercea
5fe6f56563 Disable intermittently failing OpenMP test.
Diff: https://reviews.llvm.org/D159003
2023-08-28 12:56:22 -04:00
Doru Bercea
41bb5ef11f Add passing test for issue 64797. 2023-08-28 09:55:56 -04:00
Joachim Jenke
1880d8f5c1 [OpenMP][Archer] Add support for taskwait depend
At the moment Archer segfaults due to a null-pointer access, if an application
uses taskwait with depend clause as used in the two new tests.
This patch cleans up the task_schedule function, moves semantic blocks into
functions and replaces the if blocks by a single switch statement. The switch
statement will warn, when new enum values are added in OMPT and makes clear
what code is executed for the different cases.

With free-agent tasks coming up in OpenMP 6.0, we should expect more
null-pointer task_data, so additional null-pointer checks were added.
We also cannot rely on having an implicit task on the stack, so the
BarrierIndex is stored during task creation.

Differential Revision: https://reviews.llvm.org/D158072
2023-08-28 09:43:24 +02:00
Joachim Jenke
cec855af3e [OpenMP][OMPT] Fix ompt_get_task_memory implementation
Since td_allow_completion_event is a member of the taskdata struct, not all
firstprivate/shared variables are stored at the end of the task memory
allocation. Simply report the whole allocation instead.

Furthermore, the function should always return 0 since in no case there is
another block to report.

Differential Review: https://reviews.llvm.org/D158080
2023-08-28 09:19:52 +02:00
Sandeep Kosuri
08bbff4aad [OpenMP] Codegen support for thread_limit on target directive for host
offloading

- This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5]
- The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region.

Differential Revision: https://reviews.llvm.org/D152054
2023-08-26 22:18:49 -05:00
Shilei Tian
fbcce33706 [OpenMP] Honor thread_limit value when choosing grid size
D152014 introduced an optimization that favors more smaller blocks over
fewer larger blocks, even if user sets `thread_limit` explicitly. This patch changes
the behavior to honor user value.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D158802
2023-08-26 22:17:49 -04:00
Joseph Huber
aa78e94b0b [Libomptarget] Support mapping indirect host calls to device functions
The changes in D157738 allowed for us to emit stub globals on the device
in the offloading entry section. These globals contain the addresses of
device functions and allow us to map host functions to their
corresponding device equivalent. This patch provides the initial support
required to build a table on the device to lookup the associated value.
This is done by finding these entries and creating a global table on the
device that can be searched with a simple binary search.

This requires an allocation, which supposedly should be automatically
freed at plugin shutdown. This includes a basic test which looks up device
pointers via a host pointer using the added function. This will need to be built
upon to provide full support for these calls in the runtime.

To support reverse offloading it would also be useful to provide a reverse table
that allows us to get host functions from device stubs.

Depends on D157738

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D157918
2023-08-25 18:51:56 -05:00
Michael Halkenhaeuser
9300b6de3c [OpenMP][OMPT] Add OMPT support for generic-elf-64bit plugin
Fixes: https://github.com/llvm/llvm-project/issues/64487
Connect OMPT during plugin initialization and enable corresponding tests.
Avoid linking OMPT when corresponding support is disabled.

Depends on D158542

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D158543
2023-08-25 13:53:11 -04:00
Johannes Doerfert
a01398156a [OpenMPOpt][FIX] Ensure to propagate information about parallel regions
Before, we checked the parallel region only once, and ignored updates in
the KernelInfo for the parallel region that happened later. This caused
us to think nested parallel sections are not present even if they are,
among other things.
2023-08-25 10:46:56 -07:00
Michael Halkenhaeuser
275259eb9a [OpenMP] Add getComputeUnitKind to generic-elf-64bit plugin
Make the generic-plugin report a corresponding CU kind -- instead of 'unknown'.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D158542
2023-08-25 07:14:43 -04:00
Johannes Doerfert
d2c37fc4f7 [Attributor][FIX] Avoid dangling stack references in map
The old code did not account for new queries during an update, which
caused us to leave stack RQIs in the map. We are now explicit about
temporary vs non-temporary RQIs.

Fixes: https://github.com/llvm/llvm-project/issues/64959
2023-08-24 16:28:10 -07:00
Johannes Doerfert
3611300a32 [OpenMP][FIX] Update tests after D157725 2023-08-24 16:28:10 -07:00
Aaron Jarmusch
1ff0bdb86d [OpenMP] Fix Slice Duplicate in Profiler
Fixed the broken commit - 6579021f02aed021d8cfab808072aa50311e6d12
Fix for the AMDGPU buildbot reported by @jplehr.
2023-08-24 20:52:15 +00:00
Aaron Jarmusch
6579021f02 [OpenMP] Fix Slice Duplicate in Profiler
Using LIBOMPTARGET_PROFILER, duplicates are created from timing both Kernel functions and Data update functions.
I commented out the duplicate timescope and left them in the targetkernel and the targetdataupdate functions. This
way the timescope calls will be closer to the launching of the kernel and the data moving.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D157725
2023-08-24 19:41:37 +00:00
Johannes Doerfert
908ae84351 [OpenMP] Avoid assumptions at the end of a kernel
When we used to treat the kernel end as as aligned barrier, assertions
at the end made sense. Now, they actually cause problems as the "writes"
are not ordered with regards to reads within the kernel. We can simply
get rid of them.
2023-08-23 16:11:43 -07:00
Johannes Doerfert
80906ce48d [OpenMP] Disable early vectorization of loads/stores in the runtime
We are having a hard time optimizing some vectorized loads/stores later
on which causes this optimization to degrade performance.

Differential Revision: https://reviews.llvm.org/D158656
2023-08-23 15:14:14 -07:00
Johannes Doerfert
382b97554d [OpenMP] Force the parallel abstraction to be inlined
This is good for performance and compile time and the indirection (+
switch statements) is nothing that needs to be preserved.
2023-08-23 11:48:18 -07:00
Johannes Doerfert
81a02b0767 [Attributor][NFC] Precommit test 2023-08-23 11:48:18 -07:00
Johannes Doerfert
7481b465ae [OpenMP] Use default grid value for static grid size
If the user did not provide any static clause to override the grid size,
we assume the default grid size as upper bound and use it to improve
code generation through vendor specific attributes.

Fixes: https://github.com/llvm/llvm-project/issues/64816

Differential Revision: https://reviews.llvm.org/D158382
2023-08-23 11:12:03 -07:00
Johannes Doerfert
c5488c8dcc [OpenMP] Properly set static thread limit (w/o analysis)
We used to have two separate implementations to derive the number of
threads used in a target region. This lead us to sometimes miss out on
user provided thread bounds (num_threads, or thread_limit) when we
looked for "constant default values". If we might miss out on the
presence of those bounds, we cannot set the thread_limit statically
since the runtime will try to honor user input rather than cap it at the
"preferred default". This patch replaces the secondary implementation
with the primary in a mode that will not emit code but just look for the
presence, and potentially upper bounds, of thread limiting clauses.

The runtime test would not pass without this rewrite as we missed some
clauses, set the static limit on the device to the preferred value, but
then violated that value at runtime.

Fixes: https://github.com/llvm/llvm-project/issues/64845

Differential Revision: https://reviews.llvm.org/D158381
2023-08-23 11:12:03 -07:00
Vadim Paretsky
6789dda762 [OpenMP] make small memory allocations in loop collapse code on the stack
A few places in the loop collapse support code make small dynamic allocations
that introduce a noticeable performance overhead when made on the heap.
This change moves allocations up to 32 bytes to the stack instead of the heap.

Differential Revision: https://reviews.llvm.org/D158220
2023-08-23 10:37:45 -07:00
Jonathan Peyton
99f5969565 [OpenMP] Let primary thread gather topology info for each worker thread
This change has the primary thread create each thread's initial mask
and topology information so it is available immediately after
forking. The setting of mask/topology information is decoupled from the
actual binding. Also add this setting of topology information inside the
__kmp_partition_places mechanism for OMP_PLACES+OMP_PROC_BIND.

Without this, there could be a timing window after the primary
thread signals the workers to fork where worker threads have not yet
established their affinity mask or topology information.

Each worker thread will then bind to the location the primary thread
sets.

Differential Revision: https://reviews.llvm.org/D156727
2023-08-22 15:56:51 -05:00
Michael Halkenhaeuser
57f0bdc8fb [OpenMP][OMPT] Fix target enter data callback ordering & reported device num
This patch fixes: https://github.com/llvm/llvm-project/issues/64738
We observed multiple issues, primarily that the `DeviceId` was reported as -1
in certain scenarios. The reason for this is simply that the device is not
initialized at that point. Hence, we need to move the RAII object creation just
after the `checkDeviceAndCtors`, closer to the actual call we want to observe.

This also solves an odering issue where one `target enter data` callback would
be executed before the `Init` callback.
Additionally, this change will also fix that the callbacks corresponding to
`enter / exit data` and `update` in conjunction with `nowait` would not result
in the emission of an OMPT callback.

Added a testcase to cover initialized device number and `omp target` constructs.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D157605
2023-08-22 13:12:09 -04:00
Kazu Hirata
11e2975810 Fx typos in documentation 2023-08-18 23:36:04 -07:00
Johannes Doerfert
9c08e76f3e [Attributor] Introduce AAIndirectCallInfo
AAIndirectCallInfo will collect information and specialize indirect call
sites. It is similar to our IndirectCallPromotion but runs as part of
the Attributor (so with assumed callee information). It also expands
more calls and let's the rest of the pipeline figure out what is UB, for
now. We use existing call promotion logic to improve the result,
otherwise we rely on the (implicit) function pointer cast.

This effectively "fixes" #60327 as it will undo the type punning early
enough for the inliner to work with the (now specialized, thus direct)
call.

Fixes: https://github.com/llvm/llvm-project/issues/60327
2023-08-18 16:44:05 -07:00
Terry Wilmarth
f0221fb1d7 [OpenMP] Add option to use different units for blocktime
This change adds the option of using different units for blocktimes specified via the KMP_BLOCKTIME environment variable. The parsing of the environment now recognizes units suffixes: ms and us. If a units suffix is not specified, the default unit is ms. Thus default behavior is still the same, and any previous usage still works the same. Internally, blocktime is now converted to microseconds everywhere, so settings that exceed INT_MAX in microseconds are considered "infinite".

kmp_set/get_blocktime are updated to use the units the user specified with KMP_BLOCKTIME, and if not specified, ms are used.

Added better range checking and inform messages for the two time units. Large values of blocktime for default (ms) case (beyond INT_MAX/1000) are no longer allowed, but will autocorrect with an INFORM message.

The delay for determining ticks per usec was lowered.  It is now 1 million ticks which was calculated as ~450us based on 2.2GHz clock which is pretty typical base clock frequency on X86:
(1e6 Ticks)  /  (2.2e9 Ticks/sec)  *  (1e6 usec/sec)  =  454 usec
Really short benchmarks can be affected by longer delay.

Update KMP_BLOCKTIME docs.

Portions of this commit were authored by Johnny Peyton.

Differential Revision: https://reviews.llvm.org/D157646
2023-08-18 14:01:13 -05:00
Johannes Doerfert
5eb7a427b0 [Attributor][NFC] Precommit tests 2023-08-17 22:42:38 -07:00
Johannes Doerfert
4fcd5f93d6 [OpenMPOpt] Mark more runtime functions as SPMD compatible
Fixes: https://github.com/llvm/llvm-project/issues/64421
2023-08-17 18:33:24 -07:00
Joseph Huber
5717329f1a [Libomptarget] Disable deadlocking bug49334.cpp test on AMDGPU
This test hangs on AMDGPU sporadically, disable it for the time being.

Fixes: https://github.com/llvm/llvm-project/issues/64733

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D158082
2023-08-16 10:24:00 -05:00
Michael Halkenhaeuser
41f3626f8b [OpenMP][OMPT] Fix reported target pointer for data alloc callback
This patch fixes: https://github.com/llvm/llvm-project/issues/64671
DataOp EMI callbacks would not report the correct target pointer.
This is now alleviated by passing a `void**` into the function which
emits the actual callback, then evaluating that pointer.

Note: Since this is only done after the pointer has been properly
updated, only `endpoint=2` callbacks will show a non-null value.

Reviewed By: dhruvachak, jdoerfert

Differential Revision: https://reviews.llvm.org/D157996
2023-08-16 06:39:10 -04:00
Ye Luo
1c822e1e82 [libomptarget] Avoid unintialized GenericPluginTy::NumDevices 2023-08-13 00:01:50 -05:00