OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary host runtime changes.
Reland https://github.com/llvm/llvm-project/pull/146403. After manual
testing on a gfx90a machine, I could not reproduce the failing test,
which makes it even more likely that the test has just been flaky. (Or
at least that it's not an issue related to this patch.)
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary host runtime changes.
When running the `openmp` testsuite on 32-bit SPARC, several tests
`FAIL` apparently randomly, but always with the same kind of error:
```
# error: command failed with exit status: -11
```
The tests die with `SIGBUS`, as can be seen in `truss` output:
```
26461/1: Incurred fault #5, FLTACCESS %pc = 0x00010EAC
26461/1: siginfo: SIGBUS BUS_ADRALN addr=0x0013D12C
26461/1: Received signal #10, SIGBUS [default]
26461/1: siginfo: SIGBUS BUS_ADRALN addr=0x0013D12C
```
i.e. the code is trying an unaligned access which cannot work on SPARC,
a strict-alignment target which enforces natural alignment on access.
This explains the apparent randomness of the failures: if the memory
happens to be aligned appropriately, the tests work, but fail if not.
A `Debug` build reveals much more:
- `__kmp_alloc` currently aligns to `sizeof(void *)`, which isn't enough
on strict-alignment targets when the data are accessed as types
requiring larger alignment. Therefore, this patch increases `alignment`
to `SizeQuant`.
- 32-bit Solaris/sparc `libc` guarantees 8-byte alignment from `malloc`,
so this patch adjusts `SizeQuant` to match.
- There's a `SIGBUS` in
```
__kmpc_fork_teams (loc=0x112f8, argc=0,
microtask=0x16cc8
<__omp_offloading_ffbc020a_4b1abe_main_l9_debug__.omp_outlined>)
at openmp/runtime/src/kmp_csupport.cpp:573
573 *(kmp_int64 *)(&this_thr->th.th_teams_size) = 0L;
```
Casting to a pointer to a type requiring 64-bit alignment when that
isn't guaranteed is wrong. Instead, this patch uses `memset` instead.
- There's another `SIGBUS` in
```
0xfef8cb9c in __kmp_taskloop_recur (loc=0x10cb8, gtid=0, task=0x23cd00,
lb=0x23cd18, ub=0x23cd20, st=1, ub_glob=499, num_tasks=100, grainsize=5,
extras=0, last_chunk=0, tc=500, num_t_min=20,
codeptr_ra=0xfef8dbc8 <__kmpc_taskloop(ident_t*, int, kmp_task_t*, int,
kmp_uint64*, kmp_uint64*, kmp_int64, int, int, kmp_uint64, void*)+240>,
task_dup=0x0)
at openmp/runtime/src/kmp_tasking.cpp:5147
5147 p->st = st;
```
`p->st` doesn't currently guarantee the 8-byte alignment required by
`kmp_int64 st`. `p` is set in
```
__taskloop_params_t *p = (__taskloop_params_t *)new_task->shareds;
```
but `shareds_offset` is currently aligned to `sizeof(void *)` only.
Increasing it to `sizeof(kmp_uint64)` to match its use fixes the
`SIGBUS`.
With these fixes I get clean `openmp` test results on 32-bit SPARC (both
Solaris and Linux), with one unrelated exception.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`sparc-unknown-linux-gnu`, `sparc64-unknown-linux-gnu`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
This can happen in static destructors when called after the
runtime is already shutdown (e.g., by ompt_finalize_tool). Even
though it is technically an error to call omp_destroy_lock after
shutdown, the application doesn't necessarily know that omp_destroy_lock
was already called. This is safe becaues all indirect locks are
destoryed in __kmp_cleanup_indirect_user_locks so the return
value will always be valid or a nullptr, not garbage.
Increase specificity by using the correct unit sizes. KBytes is an
abbreviation for kB, 1000 bytes, and the hardware industry as well as
several operating systems have now switched to using 1000 byte kBs.
If this change is acceptable, sometimes GitHub mangles merges to use the
original email of the account. $dayjob asks contributions have my work
email. Thanks!
Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.
Sample Test Case from OpenMP 6.0 Example
```
#include <assert.h>
#include <omp.h>
#define N 10
void do_red(int n, int *v, int &sum_v)
{
sum_v = 0; // sum_v is private
#pragma omp for reduction(original(private),+: sum_v)
for (int i = 0; i < n; i++)
{
sum_v += v[i];
}
}
int main(void)
{
int v[N];
for (int i = 0; i < N; i++)
v[i] = i;
#pragma omp parallel num_threads(4)
{
int s_v; // s_v is private
do_red(N, v, s_v);
assert(s_v == 45);
}
return 0;
}
```
Expected Codegen:
```
// A shared global/static variable is introduced for the reduction result.
// This variable is initialized (e.g., using memset or a UDR initializer)
// e.g., .omp.reduction.internal_private_var
// Barrier before any thread performs combination
call void @__kmpc_barrier(...)
// Initialization block (executed by thread 0)
// e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)
call void @__kmpc_critical(...)
// Inside critical section:
// Load the current value from the shared variable
// Load the thread-local private variable's value
// Perform the reduction operation
// Store the result back to the shared variable
call void @__kmpc_end_critical(...)
// Barrier after all threads complete their combinations
call void @__kmpc_barrier(...)
// Broadcast phase:
// Load the final result from the shared variable)
// Store the final result to the original private variable in each thread
// Final barrier after broadcast
call void @__kmpc_barrier(...)
```
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
Summary:
This deletes and changes somet things that are out of date or wrong and
makes the recommended way to build more clear.
---------
Co-authored-by: Shilei Tian <i@tianshilei.me>
LLVM is moving towards the `target=<target triple RE>` syntax in `XFAIL:
` etc., and I'll need the same in a subsequent patch.
This patch adds the necessary infrastructure.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`sparc-unknown-linux-gnu`, `sparc64-unknown-linux-gnu`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
This patch adds SPARC infrastructure to the `openmp` `cmake` files,
matching what is done for other architectures.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`sparc-unknown-linux-gnu`, `sparc64-unknown-linux-gnu`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
Linking `libomp.so` on 32-bit SPARC `FAIL`s with
```
ld: fatal: file projects/openmp/runtime/src/CMakeFiles/omp.dir/z_Linux_asm.S.o: wrong ELF class: ELFCLASS64
```
This was a 1-stage build with a 64-bit-default `gcc`. Unlike the C++
sources, which were compiled as 32-bit objects due to the use of
`-DCMAKE_CXX_FLAGS=-m32`, the assembler sources were not.
This patch simplifies passing `-m32`: instead of doing it per
architecture, `-m32` is now always passed when the target uses 32-bit
pointers and supports the option.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`sparc-unknown-linux-gnu`, `sparc64-unknown-linux-gnu`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
`openmp` currently doesn't compile on 32-bit Solaris:
```
FAILED: projects/openmp/runtime/src/CMakeFiles/omp.dir/z_Linux_util.cpp.o
[...]
In file included from openmp/runtime/src/z_Linux_util.cpp:78:
In file included from /usr/include/libproc.h:25:
In file included from /usr/include/gelf.h:10:
/usr/include/libelf.h:22:2: error: "large files are not supported by libelf"
22 | #error "large files are not supported by libelf"
| ^
In file included from openmp/runtime/src/z_Linux_util.cpp:78:
/usr/include/libproc.h:42:2: error: "Cannot use libproc in the large file compilation environment"
42 | #error "Cannot use libproc in the large file compilation environment"
| ^
```
Looking closer, there's no point in using `Pgrab` (the only interface
from `<libproc.h>`) at all: the resulting `ps_prochandle_t *` isn't used
in the remainder of the code and the original PR #82930 gives no
indication why it is deemed necessary/useful.
While at it, this patch switches to use `/proc/self/xmap`, matching
`compiler-rt`'s `sanitizer_procmaps_solaris.cpp`, and makes some minor
formatting fixes.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`i386-pc-solaris2.11`, and `amd64-pc-solaris2.11`.
Parts of the `openmp` testsuite currently don't build on SPARC due to
the lack of a `print_possible_return_addresses` definition.
This patch provides one. With it, the vast majority of tests `PASS` on
Solaris/sparcv9 and, with an additional patch, on Linux/sparc64.
The current definition was obtained empirically.
Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
`libomp` uses `__builtin_return_address` in two places. However, on some
targets those calls need to wrapped in `___builtin_extract_return_addr`
to get at the actual return address. SPARC is among those targets and
the only one where `clang` actually implements this, cf. [[clang][Sparc]
Fix __builtin_extract_return_addr
etc.](https://reviews.llvm.org/D91607). `compiler-rt` needed the same
adjustment, cf. [[sanitizer_common][test] Enable tests on
SPARC](https://reviews.llvm.org/D91608). On other targets, this is a
no-op. However, there are more targets that have the same issue and
`gcc`, unlike `clang`, correctly implements it, so there might be issues
when building `libomp` with `gcc`.
This patch adds the necessary calls.
Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
Only 2 `openmp` testsuite failures remain on Solaris/amd64. They are due
the same issue: the tests in question assume `NULL` pointers to be
printed as `(nil)` while the rest of the testsuite uses `[[NULL]]` for
that.
This patch changes them to follow suit.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
After PR #138514, only 3 testsuite failures remain on Solaris/amd64. One
of them is
```
libomp :: ompt/loadtool/tool_available_search/tool_available_search.c
```
The issue is that the expected message is that emitted by Linux/glibc,
while the Solaris message differs:
On Linux/x86_64, I get
```
Opening projects/openmp/runtime/test/ompt/loadtool/tool_available_search/Output/non_existing_file.so... Failed: projects/openmp/runtime/test/ompt/loadtool/tool_available_search/Output/non_existing_file.so: cannot open shared object file: No such file or directory
```
while Solaris/amd64 emits
```
Opening projects/openmp/runtime/test/ompt/loadtool/tool_available_search/Output/non_existing_file.so... Failed: ld.so.1: tool_available_search.c.tmp: projects/openmp/runtime/test/ompt/loadtool/tool_available_search/Output/non_existing_file.so: open failed: No such file or directory
```
Since the exact wording is obviously an implementation detail, this
patch allows for both forms.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
Testing `openmp` on Solaris/amd64 shows a large number of failures, all
due to the same issue:
```
# .---command stderr------------
# | openmp/runtime/test/ompt/misc/interoperability.cpp:67:16: error: CHECK-SAME: expected string not found in input
# | // CHECK-SAME: parent_task_frame.reenter={{0x[0-f]+}}
# | ^
# | <stdin>:5:101: note: scanning from here
# | 281474976710658: ompt_event_parallel_begin: parent_task_id=281474976710659, parent_task_frame.exit=0, parent_task_frame.reenter=7fffbedffe90, parallel_id=281474976710661, requested_team_size=2, codeptr_ra=408b8e, invoker=2
```
The testsuite expects pointers to be printed with a `0x` prefix when
using the `%p` format, while Solaris `libc` just prints them in hex
without a prefix.
However, this difference is completely benign. ISO C (up to C23,
7.23.6.1) states
```
p The argument shall be a pointer to void or a pointer to a character
type. The value of the pointer is converted to a sequence of printing
characters, in an implementation-defined manner.
```
I saw two ways around this:
- replace every instance of `%p` with a macro (`KMP_PTR_FMT`, defined as
`"%p"` or `"0x%p" as appropriate), or
- adjust the testsuite to make the `0x` prefix optional
The second route seemed less intrusive and more readable, so that's what
this patch does. While large, it's also completely mechanical.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
PR #138517 broke the Android LLVM builders: ARM doesn't understand the
`@object` form. As it turns out, one can use `%object` instead, which
does assemble on all targets currently supported by `z_Linux_asm.S`.
Tested by rebuilding `libomp.so` on `sparcv9-sun-solaris2.11`.
`libomp` doesn't currently build on Linux/sparc64 due to lack of
`__NR_sched_setaffinity` and `__NR_sched_getaffinity` definitions.
This patch provides those.
Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
`libomp.so` currently fails to link on SPARC, both Solaris/sparcv9 and
Linux/sparc64:
```
Undefined first referenced
symbol in file
__kmp_unnamed_critical_addr projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_gsupport.cpp.o
ld: fatal: symbol referencing errors
```
This patch provides the necessary definition. While at it, I noticed
that on non-x86 targets the symbol wasn't marked as `@object`, which
this patch corrects, too.
Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
Building `openmp` on Solaris/amd64, I get
```
In file included from openmp/runtime/src/kmp_utility.cpp:16:
openmp/runtime/src/kmp_wrapper_getpid.h:47:2: warning: No gettid found, use getpid instead [-W#warnings]
47 | #warning No gettid found, use getpid instead
| ^
```
There's no reason to do this: Solaris can use `pthread_self` just as AIX
does.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
When running the `openmp` testsuite on Solaris/amd64, many tests `FAIL`
like
```
# | OMP: Error #11: Stack overflow detected for OpenMP thread #1
```
In a `Debug` build, I also get
```
# | Assertion failure at kmp_runtime.cpp(203): __kmp_gtid_get_specific() < 0 || __kmp_gtid_get_specific() == i.
```
Further investigation shows that just setting `__kmp_gtid_mode` to 3
massively reduces the number of failures.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
When building `openmp` on Linux/sparc64, I get
```
In file included fromopenmp/runtime/src/kmp_utility.cpp:16:
openmp/runtime/src/kmp_wrapper_getpid.h:47:2: warning: No gettid found, use getpid instead [-W#warnings]
47 | #warning No gettid found, use getpid instead
| ^
```
This is highly confusing since `<sys/syscall.h>` **does** define
`SYS_gettid` and the header is supposed to be included:
```
#if !defined(KMP_OS_AIX) && !defined(KMP_OS_HAIKU)
#include <sys/syscall.h>
#endif
```
However, this actually is **not** the case for two reasons:
- `KMP_OS_HAIKU` is always defined, either as 1 on Haiku or as 0
otherwise.
- `KMP_OS_AIX` is even worse: it is only defined as 1 on on AIX, but
undefined otherwise.
All those `KMP_OS_*` macros are supposed to always be defined as 1/0 as
appropriate, and to be checked with `#if`, not `#ifdef`. AIX is
violating this, causing the problem above.
Other targets probably get `<sys/syscall.h>` indirectly otherwise, but
Linux/sparc64 does not.
This patch fixes this by also defining `KMP_OS_AIX` as 0 on other OSes
and changing the checks to `#if` as necessary.
Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
Despite our attempt (build-docs.sh)
to build the documentation with SVG,
it still uses PNG https://llvm.org/doxygen/classllvm_1_1StringRef.html,
and that renders terribly on any high dpi display.
SVG leads to smasller installation and works fine
on all browser (that has been true for _a while_
https://caniuse.com/svg), so this patch just unconditionally build all
dot graphs as SVG in all subprojects and remove the option.
This commit resolves multiple issues in the OpenMP taskgraph implementation:
- Fix a potential use of uninitialized is_taskgraph and tdg fields when a task is created outside of a taskgraph construct.
- Fix use of task ID field when accessing the taskgraph’s record_map.
- Fix resizing and copying of the successors array when its capacity is exceeded.
Fixes memory management flaws, invalid memory accesses, and uninitialized data risks in taskgraph operations.
TR11 introduced changes to support target memory management in a unified
way by defining a series of API routines and additional traits. Host
runtime is oblivious to how actual memory resources are mapped when
using the new API routines, so it can only support how the composed
memory space is maintained, and the offload backend must handle which
memory resources are actually used to allocate memory from the memory
space.
Here is summary of the implementation.
* Implemented 12 API routines to get/mainpulate memory space/allocator.
* Memory space composed with a list of devices has a state with resource
description, and runtime is responsible for maintaining the allocated
memory space objects.
* Defined interface with offload runtime to access memory resource list,
and to redirect calls to omp_alloc/omp_free since it requires
backend-specific information.
* Value of omp_default_mem_space changed from 0 to 99, and
omp_null_mem_space took the value 0 as defined in the language.
* New allocator traits were introduced, but how to use them is up to the
offload backend.
* Added basic tests for the new API routines.
This patch adds support for memory allocation using hwloc. To enable
memory allocation using hwloc, env KMP_TOPOLOGY_METHOD=hwloc needs to be
used. If hwloc is not supported/available, allocation will fallback to
default path.
This patch attempts to provide a fix for an issue that appears when the
`__kmp_dist_for_static_init` function is called from a serialized team.
This is triggered by code generated by flang for `distribute parallel
do` constructs whenever an `if` clause for the `parallel` leaf construct
is present. This results in the introduction of a call to
`__kmpc_fork_call_if` in place of `__kmpc_fork_call`. When it evaluates
to `false`, it defers execution to `__kmp_serialized_parallel`, which
creates a new serial team that is picked up by
`__kmp_dist_for_static_init`, resulting in an incorrect `team` pointer
that causes the `nteams == (kmp_uint32)team->t.t_parent->t.t_nproc`
assertion to fail.
The sequence of calls replicating this issue can be summarized as:
- `__kmpc_fork_teams`
- `__kmpc_fork_call_if`
- `__kmpc_dist_for_static_init_*`
Since I am not familiar with the implementation of the OpenMP runtime,
it is possible that the above sequence of calls is incorrect, or that
the bug can be better fixed in another way, so I am open to discussing
this.
The following Fortran program can be compiled with flang to show the
issue:
```f90
! Compile and run: flang -fopenmp test.f90 -o test && ./test
! Check LLVM IR: flang -fc1 -emit-llvm -fopenmp test.f90 -o -
program main
implicit none
integer, parameter :: n = 10
integer :: i, idx(n)
!$omp teams
!$omp distribute parallel do if(.false.)
do i=1,n
idx(i) = i
end do
!$omp end teams
print *, idx
end program
```
Updating OpenMP runtime taskgraph support(record/replay mechanism):
- Adds a `graph_reset` bit in `kmp_taskgraph_flags_t` to discard
existing TDG records.
- Switches from a strict index-based TDG ID/IDX to a more flexible
integer-based, which can be any integer (e.g. hashed).
- Adds helper functions like `__kmp_find_tdg`, `__kmp_alloc_tdg`, and
`__kmp_free_tdg` to manage TDGs by their IDs.
These changes pave the way for the integration of OpenMP taskgraph (spec
6.0). Taskgraphs are still recorded in an array with a lookup efficiency
reduced to O(n), where n ≤ `__kmp_max_tdgs`. This can be optimized by
moving the TDGs to a hashtable, making lookups more efficient. The
provided helper routines facilitate easier future optimizations.
This PR creates a new member for task data, which is used to identify
the task in its taskgraph (when ompx taskgraph is enabled).
It aims to remove the overloading of the td_task_id member, which was
used both by the debugger and the taskgraph. This resulted in the
identifier's non-unicity in the case of multiple taskgraphs.
Co-authored-by: Rémy Neveu <rem2007@free.fr>
This PR fixes warning which occurs if one compiles OpenMP runtime with
GCC:
```
warning: comparison of integer expressions of different signedness: 'kmp_intptr_t' {aka 'long int'} and 'long unsigned int' [-Wsign-compare]
```
In several cases the flags entries in ompt_frame_t are not initialized.
According to @jdelsign the address provided as reenter and exit address
is the canonical frame address (cfa) rather than a "framepointer". This
patch makes sure that the flags entry is always initialized and changes
the value from ompt_frame_framepointer to ompt_frame_cfa.
The assertion in the tests makes sure that the flags are always set,
when a tool (callback.h in this case) looks at the value.
Fixes#89058
Switch to using __tsan_init rather than RunningOnValgrind as the means
for detecting TSan instumented binaries. RunningOnValgrind is present in
other libraries (such as Google perftools tcmalloc). An exe that links
with a tcmalloc static library and exports symbols with -rdynamic will
appear to be TSan instrumented even when it is not resulting in "Unable
to fint TSan function ..." messages.
Fixes issue #122319.