Flang currently doesn't build in debug mode on GCC 15 due to missing
dynamic libraries in some CMakeLists.txt files, and OpenMP doesn't link
in debug mode due to the atomic library pulling in libstdc++ despite an
incomplete attempt in the CMakeLists.txt to disable glibcxx assertions.
This PR fixes these issues and allows Flang and the OpenMP runtime to
build and link on GCC 15 in debug mode.
---------
Co-authored-by: ronlieb <ron.lieberman@amd.com>
Description
===========
OpenMP Tooling Interface Testing Library (ompTest) ompTest is a unit
testing framework for testing OpenMP implementations. It offers a
simple-to-use framework that allows a tester to check for OMPT events in
addition to regular unit testing code, supported by linking against
GoogleTest by default. It also facilitates writing concise tests while
bridging the semantic gap between the unit under test and the OMPT-event
testing.
Background
==========
This library has been developed to provide the means of testing OMPT
implementations with reasonable effort. Especially, asynchronous or
unordered events are supported and can be verified with ease, which may
prove to be challenging with LIT-based tests. Additionally, since the
assertions are part of the code being tested, ompTest can reference all
corresponding variables during assertion.
Basic Usage
===========
OMPT event assertions are placed before the code, which shall be tested.
These assertion can either be provided as one block or interleaved with
the test code. There are two types of asserters: (1) sequenced
"order-sensitive" and (2) set "unordered" assserters. Once the test is
being run, the corresponding events are triggered by the OpenMP runtime
and can be observed. Each of these observed events notifies asserters,
which then determine if the test should pass or fail.
Example (partial, interleaved)
==============================
```c++
int N = 100000;
int a[N];
int b[N];
OMPT_ASSERT_SEQUENCE(Target, TARGET, BEGIN, 0);
OMPT_ASSERT_SEQUENCE(TargetDataOp, ALLOC, N * sizeof(int)); // a ?
OMPT_ASSERT_SEQUENCE(TargetDataOp, H2D, N * sizeof(int), &a);
OMPT_ASSERT_SEQUENCE(TargetDataOp, ALLOC, N * sizeof(int)); // b ?
OMPT_ASSERT_SEQUENCE(TargetDataOp, H2D, N * sizeof(int), &b);
OMPT_ASSERT_SEQUENCE(TargetSubmit, 1);
OMPT_ASSERT_SEQUENCE(TargetDataOp, D2H, N * sizeof(int), nullptr, &b);
OMPT_ASSERT_SEQUENCE(TargetDataOp, D2H, N * sizeof(int), nullptr, &a);
OMPT_ASSERT_SEQUENCE(TargetDataOp, DELETE);
OMPT_ASSERT_SEQUENCE(TargetDataOp, DELETE);
OMPT_ASSERT_SEQUENCE(Target, TARGET, END, 0);
#pragma omp target parallel for
{
for (int j = 0; j < N; j++)
a[j] = b[j];
}
```
References
==========
This work has been presented at SC'24 workshops, see:
https://ieeexplore.ieee.org/document/10820689
Current State and Future Work
=============================
ompTest's development was mostly device-centric and aimed at OMPT device
callbacks and device-side tracing. Consequentially, a substantial part
of host-related events or features may not be supported in its current
state. However, we are confident that the related functionality can be
added and ompTest provides a general foundation for future OpenMP and
especially OMPT testing. This PR will allow us to upstream the
corresponding features, like OMPT device-side tracing in the future with
significantly reduced risk of introducing regressions in the process.
Build
=====
ompTest is linked against LLVM's GoogleTest by default, but can also be
built 'standalone'. Additionally, it comes with a set of unit tests,
which in turn require GoogleTest (overriding a standalone build). The
unit tests are added to the `check-openmp` target.
Use the following parameters to perform the corresponding build:
`LIBOMPTEST_BUILD_STANDALONE` (Default: ${OPENMP_STANDALONE_BUILD})
`LIBOMPTEST_BUILD_UNITTESTS` (Default: OFF)
---------
Co-authored-by: Jan-Patrick Lehr <JanPatrick.Lehr@amd.com>
Co-authored-by: Joachim <protze@rz.rwth-aachen.de>
A CMake change included in CMake 4.0 makes `AIX` into a variable
(similar to `APPLE`, etc.)
ff03db6657
However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to
`AIX` and `if` auto-expands variable names in CMake. That means you get
a double expansion if you write:
`if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")`
which becomes:
`if (AIX MATCHES "AIX")`
which is as if you wrote:
`if (ON MATCHES "AIX")`
You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}",
due to policy
[CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054)
which is on by default in 4.0+. Most of the LLVM CMake already does
this, but this PR fixes the remaining cases where we do not.
Fix the `sys.path` logic in the GDB plugin to insert the intended
self-path in the first position rather than appending it to the end. The
latter implied that if `sys.path` (naturally) contained the GDB's
`gdb-plugin` directory, `import ompd` would return the top-level
`ompd/__init__.py` module rather than the `ompd/ompd.py` submodule, as
intended by adding the `ompd/` directory to `sys.path`.
This is intended to be a minimal change necessary to fix the issue.
Alternatively, the code could be modified to import `ompd.ompd` and stop
modifying `sys.path` entirely. However, I do not know why this option
was chosen in the first place, so I can't tell if this won't break
something.
Fixes#153954
Signed-off-by: Michał Górny <mgorny@gentoo.org>
Using hex format allows to better interpret IDs:
the first digits represent the thread number, the last digits represent
the ID within a thread
The main change is in callback.h: PRIu64 -> PRIx64
The patch also guards RUN/CHECK lines in openmp/runtime/tests/ompt with clang-format on/off comments and clang-formats the directory.
---------
Co-authored-by: Kaloyan Ignatov <kaloyan.ignatov@rwth-aachen.de>
Update printf format string to match argument list
---------
Co-authored-by: Joachim <protze@rz.rwth-aachen.de>
Co-authored-by: Joachim Jenke <jenke@itc.rwth-aachen.de>
The following patch introduces a new interop interface implementation
with the following characteristics:
* It supports the new 6.0 prefer_type specification
* It supports both explicit objects (from interop constructs) and
implicit objects (from variant calls).
* Implements a per-thread reuse mechanism for implicit objects to reduce
overheads.
* It provides a plugin interface that allows selecting the supported
interop types, and managing all the backend related interop operations
(init, sync, ...).
* It enables cooperation with the OpenMP runtime to allow progress on
OpenMP synchronizations.
* It cleanups some vendor/fr_id mismatchs from the current query
routines.
* It supports extension to define interop callbacks for library cleanup.
Set LLVM_TREE_AVAILABLE when not defined after #149871. In particular,
the LLVM build tree is obviously available with
`add_subdirectory(openmp)` from the LLVM build tree itself. Note that
this build mode is deprecated since #136314.
This was updated in some earlier commits but was never updated on Darwin
because I was testing locally on Linux and it does not seem like there
are any buildbots testing this configuration. Update it since it should
be trivial and will definitely be broken otherwise.
This was added in 2b8115b10b03013b9f8ae0aa56b0cd6a6a6dd4fd and it looks
like this wass essentially a copy paste from one of the other lit config
files. This substitution is unused within the tests however and contains
a deprecated %T directive, so remove it.
I did not realize that these were originally in separate folders to
allow for the use of %T. Now that we have switched over to creating dirs
using %t, we can move these into a common folder and make things a
little bit more clean.
These were still passing because I did not clear all the test artifacts
in between so the old ones were still present after updating the test. I
forgot to update a lit substitution which failed on clean builds.
This patch removes all uses of %T from lit tests in OpenMP. %T has been
deprecated for years and is not reccomended given it does not create a
unique dir per test, allowing for race conditions. Remove uses of %T in
OpenMP so we can eventually remove support for it in llvm-lit.
The default build of openmp (`cmake -S <llvm-project>/runtimes
-DLLVM_ENABLE_RUNTIMES=openmp`) current fails with
```
CMake Error at /home/meinersbur/src/llvm/flangrt/_src/cmake/Modules/GetClangResourceDir.cmake:17 (string):
string sub-command REGEX, mode MATCH needs at least 5 arguments total to
command.
Call Stack (most recent call first):
/home/meinersbur/src/llvm/flangrt/_src/openmp/CMakeLists.txt:126 (get_clang_resource_dir)
```
The reason is that because it is not a bootstrapping-build, the clang
resource dir that it intends to write files such as `omp-tools.h` into,
is unavailable. Using the Clang resource dir for writing files is
conceptually broken, as that dir might be located in
`/usr/lib/clang/<version>/`. Writing to it is only intended in
bootstrapping builds where Clang is built alongside openmp.
This patch unifies the identification of being in a bootstrapping built.
The same `LLVM_TREE_AVAILABLE` definition is going to be used in
#137828. No reason for each runtime to define its own.
A lot of these only trip when using sanitizers with the library.
* Insert forgotten free()s
* Change (-1) << amount to 0xffffffffu as left shifting a negative is UB
* Fixup integer parser to return INT_MAX when parsing huge string of
digits. e.g., 452523423423423423 returns INT_MAX
* Fixup range parsing for affinity mask so integer overflow does not
occur
* Don't assert when branch bits are 0, instead warn user that is invalid
and use the default value.
* Fixup kmp_set_defaults() so the C version only uses null terminated
strings and the Fortran version uses the string + size version.
* Make sure the KMP_ALIGN_ALLOC is power of two, otherwise use
CACHE_LINE.
* Disallow ability to set KMP_TASKING=1 (task barrier) this doesn't work
and hasn't worked for a long time.
* Limit KMP_HOT_TEAMS_MAX_LEVEL to 1024, an array is allocated based on
this value.
* Remove integer values for OMP_PROC_BIND. The specification only allows
strings and CSV of strings.
* Fix setting KMP_AFFINITY=disabled + OMP_DISPLAY_AFFINITY=TRUE
Ticket lock has a yield operation (shown below) which degrades
performance on larger server machines due to an unconditional pause
operation.
```
#define KMP_YIELD(cond) \
{ \
KMP_CPU_PAUSE(); \
if ((cond) && (KMP_TRY_YIELD)) \
__kmp_yield(); \
}
```
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary host runtime changes.
Reland https://github.com/llvm/llvm-project/pull/146403. After manual
testing on a gfx90a machine, I could not reproduce the failing test,
which makes it even more likely that the test has just been flaky. (Or
at least that it's not an issue related to this patch.)
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary host runtime changes.
When running the `openmp` testsuite on 32-bit SPARC, several tests
`FAIL` apparently randomly, but always with the same kind of error:
```
# error: command failed with exit status: -11
```
The tests die with `SIGBUS`, as can be seen in `truss` output:
```
26461/1: Incurred fault #5, FLTACCESS %pc = 0x00010EAC
26461/1: siginfo: SIGBUS BUS_ADRALN addr=0x0013D12C
26461/1: Received signal #10, SIGBUS [default]
26461/1: siginfo: SIGBUS BUS_ADRALN addr=0x0013D12C
```
i.e. the code is trying an unaligned access which cannot work on SPARC,
a strict-alignment target which enforces natural alignment on access.
This explains the apparent randomness of the failures: if the memory
happens to be aligned appropriately, the tests work, but fail if not.
A `Debug` build reveals much more:
- `__kmp_alloc` currently aligns to `sizeof(void *)`, which isn't enough
on strict-alignment targets when the data are accessed as types
requiring larger alignment. Therefore, this patch increases `alignment`
to `SizeQuant`.
- 32-bit Solaris/sparc `libc` guarantees 8-byte alignment from `malloc`,
so this patch adjusts `SizeQuant` to match.
- There's a `SIGBUS` in
```
__kmpc_fork_teams (loc=0x112f8, argc=0,
microtask=0x16cc8
<__omp_offloading_ffbc020a_4b1abe_main_l9_debug__.omp_outlined>)
at openmp/runtime/src/kmp_csupport.cpp:573
573 *(kmp_int64 *)(&this_thr->th.th_teams_size) = 0L;
```
Casting to a pointer to a type requiring 64-bit alignment when that
isn't guaranteed is wrong. Instead, this patch uses `memset` instead.
- There's another `SIGBUS` in
```
0xfef8cb9c in __kmp_taskloop_recur (loc=0x10cb8, gtid=0, task=0x23cd00,
lb=0x23cd18, ub=0x23cd20, st=1, ub_glob=499, num_tasks=100, grainsize=5,
extras=0, last_chunk=0, tc=500, num_t_min=20,
codeptr_ra=0xfef8dbc8 <__kmpc_taskloop(ident_t*, int, kmp_task_t*, int,
kmp_uint64*, kmp_uint64*, kmp_int64, int, int, kmp_uint64, void*)+240>,
task_dup=0x0)
at openmp/runtime/src/kmp_tasking.cpp:5147
5147 p->st = st;
```
`p->st` doesn't currently guarantee the 8-byte alignment required by
`kmp_int64 st`. `p` is set in
```
__taskloop_params_t *p = (__taskloop_params_t *)new_task->shareds;
```
but `shareds_offset` is currently aligned to `sizeof(void *)` only.
Increasing it to `sizeof(kmp_uint64)` to match its use fixes the
`SIGBUS`.
With these fixes I get clean `openmp` test results on 32-bit SPARC (both
Solaris and Linux), with one unrelated exception.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`sparc-unknown-linux-gnu`, `sparc64-unknown-linux-gnu`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
This can happen in static destructors when called after the
runtime is already shutdown (e.g., by ompt_finalize_tool). Even
though it is technically an error to call omp_destroy_lock after
shutdown, the application doesn't necessarily know that omp_destroy_lock
was already called. This is safe becaues all indirect locks are
destoryed in __kmp_cleanup_indirect_user_locks so the return
value will always be valid or a nullptr, not garbage.
Increase specificity by using the correct unit sizes. KBytes is an
abbreviation for kB, 1000 bytes, and the hardware industry as well as
several operating systems have now switched to using 1000 byte kBs.
If this change is acceptable, sometimes GitHub mangles merges to use the
original email of the account. $dayjob asks contributions have my work
email. Thanks!
Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.
Sample Test Case from OpenMP 6.0 Example
```
#include <assert.h>
#include <omp.h>
#define N 10
void do_red(int n, int *v, int &sum_v)
{
sum_v = 0; // sum_v is private
#pragma omp for reduction(original(private),+: sum_v)
for (int i = 0; i < n; i++)
{
sum_v += v[i];
}
}
int main(void)
{
int v[N];
for (int i = 0; i < N; i++)
v[i] = i;
#pragma omp parallel num_threads(4)
{
int s_v; // s_v is private
do_red(N, v, s_v);
assert(s_v == 45);
}
return 0;
}
```
Expected Codegen:
```
// A shared global/static variable is introduced for the reduction result.
// This variable is initialized (e.g., using memset or a UDR initializer)
// e.g., .omp.reduction.internal_private_var
// Barrier before any thread performs combination
call void @__kmpc_barrier(...)
// Initialization block (executed by thread 0)
// e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)
call void @__kmpc_critical(...)
// Inside critical section:
// Load the current value from the shared variable
// Load the thread-local private variable's value
// Perform the reduction operation
// Store the result back to the shared variable
call void @__kmpc_end_critical(...)
// Barrier after all threads complete their combinations
call void @__kmpc_barrier(...)
// Broadcast phase:
// Load the final result from the shared variable)
// Store the final result to the original private variable in each thread
// Final barrier after broadcast
call void @__kmpc_barrier(...)
```
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
Summary:
This deletes and changes somet things that are out of date or wrong and
makes the recommended way to build more clear.
---------
Co-authored-by: Shilei Tian <i@tianshilei.me>
LLVM is moving towards the `target=<target triple RE>` syntax in `XFAIL:
` etc., and I'll need the same in a subsequent patch.
This patch adds the necessary infrastructure.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`sparc-unknown-linux-gnu`, `sparc64-unknown-linux-gnu`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
This patch adds SPARC infrastructure to the `openmp` `cmake` files,
matching what is done for other architectures.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`sparc-unknown-linux-gnu`, `sparc64-unknown-linux-gnu`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
Linking `libomp.so` on 32-bit SPARC `FAIL`s with
```
ld: fatal: file projects/openmp/runtime/src/CMakeFiles/omp.dir/z_Linux_asm.S.o: wrong ELF class: ELFCLASS64
```
This was a 1-stage build with a 64-bit-default `gcc`. Unlike the C++
sources, which were compiled as 32-bit objects due to the use of
`-DCMAKE_CXX_FLAGS=-m32`, the assembler sources were not.
This patch simplifies passing `-m32`: instead of doing it per
architecture, `-m32` is now always passed when the target uses 32-bit
pointers and supports the option.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`sparc-unknown-linux-gnu`, `sparc64-unknown-linux-gnu`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
`openmp` currently doesn't compile on 32-bit Solaris:
```
FAILED: projects/openmp/runtime/src/CMakeFiles/omp.dir/z_Linux_util.cpp.o
[...]
In file included from openmp/runtime/src/z_Linux_util.cpp:78:
In file included from /usr/include/libproc.h:25:
In file included from /usr/include/gelf.h:10:
/usr/include/libelf.h:22:2: error: "large files are not supported by libelf"
22 | #error "large files are not supported by libelf"
| ^
In file included from openmp/runtime/src/z_Linux_util.cpp:78:
/usr/include/libproc.h:42:2: error: "Cannot use libproc in the large file compilation environment"
42 | #error "Cannot use libproc in the large file compilation environment"
| ^
```
Looking closer, there's no point in using `Pgrab` (the only interface
from `<libproc.h>`) at all: the resulting `ps_prochandle_t *` isn't used
in the remainder of the code and the original PR #82930 gives no
indication why it is deemed necessary/useful.
While at it, this patch switches to use `/proc/self/xmap`, matching
`compiler-rt`'s `sanitizer_procmaps_solaris.cpp`, and makes some minor
formatting fixes.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`i386-pc-solaris2.11`, and `amd64-pc-solaris2.11`.
Parts of the `openmp` testsuite currently don't build on SPARC due to
the lack of a `print_possible_return_addresses` definition.
This patch provides one. With it, the vast majority of tests `PASS` on
Solaris/sparcv9 and, with an additional patch, on Linux/sparc64.
The current definition was obtained empirically.
Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
`libomp` uses `__builtin_return_address` in two places. However, on some
targets those calls need to wrapped in `___builtin_extract_return_addr`
to get at the actual return address. SPARC is among those targets and
the only one where `clang` actually implements this, cf. [[clang][Sparc]
Fix __builtin_extract_return_addr
etc.](https://reviews.llvm.org/D91607). `compiler-rt` needed the same
adjustment, cf. [[sanitizer_common][test] Enable tests on
SPARC](https://reviews.llvm.org/D91608). On other targets, this is a
no-op. However, there are more targets that have the same issue and
`gcc`, unlike `clang`, correctly implements it, so there might be issues
when building `libomp` with `gcc`.
This patch adds the necessary calls.
Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
Only 2 `openmp` testsuite failures remain on Solaris/amd64. They are due
the same issue: the tests in question assume `NULL` pointers to be
printed as `(nil)` while the rest of the testsuite uses `[[NULL]]` for
that.
This patch changes them to follow suit.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
After PR #138514, only 3 testsuite failures remain on Solaris/amd64. One
of them is
```
libomp :: ompt/loadtool/tool_available_search/tool_available_search.c
```
The issue is that the expected message is that emitted by Linux/glibc,
while the Solaris message differs:
On Linux/x86_64, I get
```
Opening projects/openmp/runtime/test/ompt/loadtool/tool_available_search/Output/non_existing_file.so... Failed: projects/openmp/runtime/test/ompt/loadtool/tool_available_search/Output/non_existing_file.so: cannot open shared object file: No such file or directory
```
while Solaris/amd64 emits
```
Opening projects/openmp/runtime/test/ompt/loadtool/tool_available_search/Output/non_existing_file.so... Failed: ld.so.1: tool_available_search.c.tmp: projects/openmp/runtime/test/ompt/loadtool/tool_available_search/Output/non_existing_file.so: open failed: No such file or directory
```
Since the exact wording is obviously an implementation detail, this
patch allows for both forms.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
Testing `openmp` on Solaris/amd64 shows a large number of failures, all
due to the same issue:
```
# .---command stderr------------
# | openmp/runtime/test/ompt/misc/interoperability.cpp:67:16: error: CHECK-SAME: expected string not found in input
# | // CHECK-SAME: parent_task_frame.reenter={{0x[0-f]+}}
# | ^
# | <stdin>:5:101: note: scanning from here
# | 281474976710658: ompt_event_parallel_begin: parent_task_id=281474976710659, parent_task_frame.exit=0, parent_task_frame.reenter=7fffbedffe90, parallel_id=281474976710661, requested_team_size=2, codeptr_ra=408b8e, invoker=2
```
The testsuite expects pointers to be printed with a `0x` prefix when
using the `%p` format, while Solaris `libc` just prints them in hex
without a prefix.
However, this difference is completely benign. ISO C (up to C23,
7.23.6.1) states
```
p The argument shall be a pointer to void or a pointer to a character
type. The value of the pointer is converted to a sequence of printing
characters, in an implementation-defined manner.
```
I saw two ways around this:
- replace every instance of `%p` with a macro (`KMP_PTR_FMT`, defined as
`"%p"` or `"0x%p" as appropriate), or
- adjust the testsuite to make the `0x` prefix optional
The second route seemed less intrusive and more readable, so that's what
this patch does. While large, it's also completely mechanical.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
PR #138517 broke the Android LLVM builders: ARM doesn't understand the
`@object` form. As it turns out, one can use `%object` instead, which
does assemble on all targets currently supported by `z_Linux_asm.S`.
Tested by rebuilding `libomp.so` on `sparcv9-sun-solaris2.11`.
`libomp` doesn't currently build on Linux/sparc64 due to lack of
`__NR_sched_setaffinity` and `__NR_sched_getaffinity` definitions.
This patch provides those.
Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
`libomp.so` currently fails to link on SPARC, both Solaris/sparcv9 and
Linux/sparc64:
```
Undefined first referenced
symbol in file
__kmp_unnamed_critical_addr projects/openmp/runtime/src/CMakeFiles/omp.dir/kmp_gsupport.cpp.o
ld: fatal: symbol referencing errors
```
This patch provides the necessary definition. While at it, I noticed
that on non-x86 targets the symbol wasn't marked as `@object`, which
this patch corrects, too.
Tested on `sparcv9-sun-solaris2.11`, `sparc64-unknown-linux-gnu`,
`amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
Building `openmp` on Solaris/amd64, I get
```
In file included from openmp/runtime/src/kmp_utility.cpp:16:
openmp/runtime/src/kmp_wrapper_getpid.h:47:2: warning: No gettid found, use getpid instead [-W#warnings]
47 | #warning No gettid found, use getpid instead
| ^
```
There's no reason to do this: Solaris can use `pthread_self` just as AIX
does.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
When running the `openmp` testsuite on Solaris/amd64, many tests `FAIL`
like
```
# | OMP: Error #11: Stack overflow detected for OpenMP thread #1
```
In a `Debug` build, I also get
```
# | Assertion failure at kmp_runtime.cpp(203): __kmp_gtid_get_specific() < 0 || __kmp_gtid_get_specific() == i.
```
Further investigation shows that just setting `__kmp_gtid_mode` to 3
massively reduces the number of failures.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.