This change makes the runtime decide the intended use of each barrier
invocation, for the OMPT synchronization tool callbacks. The OpenMP 5.0
specification defines four possible barrier kinds -- implicit, explicit,
implementation, and just normal barrier.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D58247
llvm-svn: 355140
Nest-var, OMP_NESTED, omp_set_nested()., and omp_get_nested() have been
deprecated in the 5.0 spec. Initial nesting info is now derived from
OMP_MAX_ACTIVE_LEVELS, OMP_NUM_THREADS, and OMP_PROC_BIND.
This patch deprecates the internal ICV that corresponds to nest-var, and
replaces it with the max-active-levels-var ICV to determine nesting. The
change still allows for use of OMP_NESTED (according to 5.0 changes),
omp_get_nested, and omp_set_nested, which have had deprecation messages
added to them. The change allows certain settings of OMP_NUM_THREADS,
OMP_PROC_BIND, and OMP_MAX_ACTIVE_LEVELS to turn on nesting, but
OMP_NESTED=0 will still force nesting to be off.
The runtime now prints informative messages about deprecation of
OMP_NESTED, omp_set_nested(), and omp_get_nested(), when those
environment variables or routines are used. It also prints deprecated
message in output for KMP_SETTINGS and OMP_DISPLAY_ENV for OMP_NESTED.
This patch also fixes OMP_DISPLAY_ENV output for OMP_TARGET_OFFLOAD.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58408
llvm-svn: 355138
This patch cleans up the yielding code and makes it optional. An
environment variable, KMP_USE_YIELD, was added. Yielding is still
on by default (KMP_USE_YIELD=1), but can be turned off completely
(KMP_USE_YIELD=0), or turned on only when oversubscription is detected
(KMP_USE_YIELD=2). Note that oversubscription cannot always be detected
by the runtime (for example, when the runtime is initialized and the
process forks, oversubscription cannot be detected currently over
multiple instances of the runtime).
Because yielding can be controlled by user now, the library mode
settings (from KMP_LIBRARY) for throughput and turnaround have been
adjusted by altering blocktime, unless that was also explicitly set.
In the original code, there were a number of places where a double yield
might have been done under oversubscription. This version checks
oversubscription and if that's not going to yield, then it does
the spin check.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58148
llvm-svn: 355120
The thread-limit-var and omp_get_thread_limit API was not perfectly handled for
teams construct. Now, when modified by thread_limit clause, omp_get_thread_limit
reports the correct value. In addition, the value is restored when leaving the
teams construct to what it was in the encountering context.
This is done partly by creating the notion of a Contention Group root (CG root)
that keeps track of the thread at the root of each separate CG, the
thread-limit-var associated with the CG, and associated counter of active
threads within the contention group.
thread-limits are passed from master to worker threads via an entry in the ICV
data structure. When a "contention group switch" occurs, a new CG root record is
made and passed from master to worker. A thread could potentially have several
CG root records if it encounters multiple nested teams constructs (but at the
moment the spec doesn't allow for nested teams, so the most one could have
currently is 2). The master of the teams masters gets the thread-limit clause
value stored to its local ICV structure, and the other teams masters copy it
from the master. The thread-limit is set from that ICV copy and restored to the
ICV copy when entering and leaving the teams construct.
This change also fixes a bug when the top-level teams construct team gets
reused, and OMP_DYNAMIC was true, which can cause the expected size of this team
to be smaller than what was actually allocated. The fix updates the size of the
team after its threads were reserved.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D56804
llvm-svn: 353747
The three switch fallthrough generate a warning with -Wimplicit-fallthrough.
Two are documented as fallthrough, one is not, but I think the intention is to also fallthrough in kmp_tasking.cpp.
Not sure whether kmp.h is the best place to define the macro.
Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld
Reviewed By: jlpeyton
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D56397
llvm-svn: 353052
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351648
Add omp_pause_resource and omp_pause_resource_all API and enum, plus stub for
internal implementation. Implemented callable helper function to do local pause,
and added basic functionality for hard and soft pause.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55078
llvm-svn: 351372
The omp-tools.h file is generated from the OpenMP spec to ensure that the interface
is implemented as specified.
The other changes are necessary to update the interface implementation to the
final version as published in 5.0.
The omp-tools.h header was previously called ompt.h, currently a copy under this name
is installed for legacy tools.
Patch partially perpared by @sconvent
Reviewers: AndreyChurbanov, hbae, Hahnfeld
Reviewed By: hbae
Tags: #openmp, #ompt
Differential Revision: https://reviews.llvm.org/D55579
llvm-svn: 351197
This patch updates the implementation of the ompt_frame_t, ompt_wait_id_t
and ompt_state_t. The final version of the OpenMP 5.0 spec added the "t"
for these types.
Furthermore the structure for ompt_frame_t changed and allows to specify
that the reenter frame belongs to the runtime.
Patch partially prepared by Simon Convent
Reviewers: hbae
llvm-svn: 349458
This patch adds the affinity format functionality introduced in OpenMP 5.0.
This patch adds: Two new environment variables:
OMP_DISPLAY_AFFINITY=TRUE|FALSE
OMP_AFFINITY_FORMAT=<string>
and Four new API:
1) omp_set_affinity_format()
2) omp_get_affinity_format()
3) omp_display_affinity()
4) omp_capture_affinity()
The affinity format functionality has two ICV's associated with it:
affinity-display-var (bool) and affinity-format-var (string).
The affinity-display-var enables/disables the functionality through the
envirable OMP_DISPLAY_AFFINITY. The affinity-format-var is a formatted
string with the special field types beginning with a '%' character
similar to printf
For example, the affinity-format-var could be:
"OMP: host:%H pid:%P OStid:%i num_threads:%N thread_num:%n affinity:{%A}"
The affinity-format-var is displayed by every thread implicitly at the beginning
of a parallel region when any thread's affinity has changed (including a brand
new thread being spawned), or explicitly using the omp_display_affinity() API.
The omp_capture_affinity() function can capture the affinity-format-var in a
char buffer. And omp_set|get_affinity_format() allow the user to set|get the
affinity-format-var explicitly at runtime. omp_capture_affinity() and
omp_get_affinity_format() both return the number of characters needed to hold
the entire string it tried to make (not including NULL character). If not
enough buffer space is available,
both these functions truncate their output.
Differential Revision: https://reviews.llvm.org/D55148
llvm-svn: 349089
Fix two build issues:
1) Recent commit 348756 accidentally included Unix clang compilers
to use immintrin.h when only clang-cl should be using it leading
to the following error:
openmp-llvm/runtime/src/kmp_lock.cpp:2035:25: error: always_
inline function '_xbegin' requires target feature 'rtm', but would be inlined into function
'__kmp_test_adaptive_lock_only' that is compiled without support for 'rtm'
kmp_uint32 status = _xbegin();
This patch changes the guard to use immintrin.h to only use clang-cl instead of all clang
2) gcc-8 gives a warning about multiline comment in kmp_runtime.cpp:
This patch just changes it to a two line comment
openmp-llvm/runtime/src/kmp_runtime.cpp:7697:8: warning: multi-line comment [-Wcomment]
#endif // KMP_OS_LINUX || KMP_OS_DRAGONFLY || KMP_OS_FREEBSD || KMP_OS_NETBSD \
llvm-svn: 348783
Summary: This patch permits OpenMP to build and work (with both gcc and clang) on OpenBSD. It mostly follows what was done for FreeBSD and NetBSD, except OpenBSD does not have pthread_getattr_np support, so it follows OS X in that one instance.
Reviewers: #openmp, krytarowski
Reviewed By: krytarowski
Subscribers: guansong, jfb, emaste, mgorny, krytarowski, #openmp
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D34280
llvm-svn: 348726
Summary:
Additions mostly follow FreeBSD and NetBSD and are not intrusive.
There is similar patch for OpenBSD: https://reviews.llvm.org/D34280
The -lm was being omitted due to -Wl,--as-needed in cmake rule, similar patch is in freebsd-ports/devel/llvm-devel port.
Simple OpenMP programs compile and work as expected:
$ clang-devel ~/omp_hello.c -fopenmp -I/usr/local/llvm-devel/include
$ LD_LIBRARY_PATH=/usr/local/llvm-devel/lib OMP_NUM_THREADS=100 ./a.out
The assertion in LLVMgold.so when -fopenmp was used together with -flto in 20170524 snapshot is no longer triggered on current svn-trunk and works fine as in llvm-4.0 with our local patches.
Reviewers: #openmp, krytarowski
Reviewed By: krytarowski
Subscribers: dexonsmith, jfb, krytarowski, guansong, gregrodgers, emaste, mgorny, mehdi_amini
Differential Revision: https://reviews.llvm.org/D35129
llvm-svn: 348725
There is low probability that array th_hot_teams can be
accessed out of bound (when many nested levels are requested
to keep hot teams via KMP_HOT_TEAMS_MAX_LEVEL). The patch
adds the check of index that fixes the problem.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D54950
llvm-svn: 347800
Initializing an ompt_data_t object using the pointer union member is potentially
unsafe in 32-bit programs. This change fixes the issue
by using the constant, ompt_data_none.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D52046
llvm-svn: 343785
Some types and callback signatures have changed from TR6 to TR7.
Major changes (only adding signatures and stubs):
(-remove idle callback) done by D48362
-add reduction and dispatch callback
-add get_task_memory and finalize_tool runtime entry points
-ompt_invoker_t becomes ompt_parallel_flag_t
-more types of sync_regions
Patch provided by Simon Convent
Reviewers: hbae, protze.joachim
Differential Revision: https://reviews.llvm.org/D50774
llvm-svn: 341834
Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries,
and OMP_ALLOCATOR environment variable.
Added support for HBW memory on Linux if libmemkind.so library is accessible
(dynamic library only, no support for static libraries).
Only used stable API (hbwmalloc) of the memkind library
though we may consider using experimental API in future.
The ICV def-allocator-var is implemented per implicit task similar to
place-partition-var. In the absence of a requested allocator, the uses the
default allocator.
Predefined allocators (the only ones currently available) are made similar
for C and Fortran, - pointers (long integers) with values 1 to 8.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D51232
llvm-svn: 341687
If hot teams are not being used, this code could seg fault without the added
check, and does so when composability is used in conjunction with nesting.
The fix prevents the segfault.
Differential Revision: https://reviews.llvm.org/D50649
llvm-svn: 340629
This patch cleans up unused functions, variables, sign compare issues, and
addresses some -Warning flags which are now enabled including -Wcast-qual.
Not all the warning flags in LibompHandleFlags.cmake are enabled, but some
are with this patch.
Some __kmp_gtid_from_* macros in kmp.h are switched to static inline functions
which allows us to remove the awkward definition of KMP_DEBUG_ASSERT() and
KMP_ASSERT() macros which used the comma operator. This had to be done for the
innumerable -Wunused-value warnings related to KMP_DEBUG_ASSERT()
Differential Revision: https://reviews.llvm.org/D49105
llvm-svn: 339393
1) Remove unnecessary data from list node structure
2) Remove timerPair in favor of pushing/popping explicitTimers.
This way, nested timers will work properly.
3) Fix #pragma omp critical timers
4) Add histogram capability
5) Add KMP_STATS_FILE formatting capability
6) Have time partitioned into serial & parallel by introducing
partitionedTimers::exchange(). This also counts the number of serial regions
in the executable.
7) Fix up the timers around OMP loops so that scheduling overhead and work are
both counted correctly.
8) Fix up the iterations statistics so they count the number of iterations the
thread receives at each loop scheduling event
9) Change timers so there is only one RDTSC read per event change
10) Fix up the outdated comments for the timers
Differential Revision: https://reviews.llvm.org/D49699
llvm-svn: 338276
This change fixes possibly invalid access to the internal data structure during
library shutdown. In a heavily oversubscribed situation, the library shutdown
sequence can reach the point where resources are deallocated while there still
exist threads in their final spinning loop. The added loop in
__kmp_internal_end() checks if there are such busy-waiting threads and blocks
the shutdown sequence if that is the case. Two versions of kmp_wait_template()
are now used to minimize performance impact.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D49452
llvm-svn: 337486
This patch introduces the logic implementing hierarchical scheduling.
First and foremost, hierarchical scheduling is off by default
To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
This work is based off if the IWOMP paper:
"Workstealing and Nested Parallelism in SMP Systems"
Hierarchical scheduling is the layering of OpenMP schedules for different layers
of the memory hierarchy. One can have multiple layers between the threads and
the global iterations space. The threads will go up the hierarchy to grab
iterations, using possibly a different schedule & chunk for each layer.
[ Global iteration space (0-999) ]
(use static)
[ L1 | L1 | L1 | L1 ]
(use dynamic,1)
[ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]
In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
If the topology indicates that there are two threads per core, then two
consecutive threads will share the data of one L1 cache unit. This example
would have the iteration space (0-999) split statically across the four L1
caches (so the first L1 would get (0-249), the second would get (250-499), etc).
Then the threads will use a dynamic,1 schedule to grab iterations from the L1
cache units. There are currently four supported layers: L1, L2, L3, NUMA
OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before
I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
to try to keep it separate from the rest of the code.
Differential Revision: https://reviews.llvm.org/D47962
llvm-svn: 336571
These are preliminary changes that attempt to use C++11 Atomics in the runtime.
We are expecting better portability with this change across architectures/OSes.
Here is the summary of the changes.
Most variables that need synchronization operation were converted to generic
atomic variables (std::atomic<T>). Variables that are updated with combined CAS
are packed into a single atomic variable, and partial read/write is done
through unpacking/packing
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D47903
llvm-svn: 336563
implicit_task_end callbacks in nested parallel regions did not always give the
correct thread_num, since the inner parallel region may have already been
finalized.
Now, the thread_num is stored at the beginning of the implicit task and
retrieved at the end, whenever necessary.
A testcase was added as well.
Differential Revision: https://reviews.llvm.org/D46260
llvm-svn: 331632
This is required to be NULL for implicit barriers at the end of a
parallel region. Noticed in review of D43191.
Differential Revision: https://reviews.llvm.org/D43308
llvm-svn: 325922
This change simplifies __kmp_expand_threads to take a single argument.
Previously, it allowed two arguments and had logic to decide on different
potential expansion sizes. However, no calls to __kmp_expand_threads in the
runtime make use of this extra logic. Thus the extra argument and logic is
removed here.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D41836
llvm-svn: 322204
This change improves stability of the runtime when the application forks child
processes. Acquiring/releasing __kmp_initz_lock and __kmp_forkjoin_lock in the
atfork handlers insures that the actual fork does not occur while those two
locks are held, and __kmp_itt_reset() reverts the itt's global state to the
initial state which also initializes the mutex stored in the global state.
Some missing initialization code was also inserted in the child's atfork handler.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D41462
llvm-svn: 322202
This change makes kmp_r_sched_t type into a union for simpler
comparisons and assignments
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D40374
llvm-svn: 319379
This is part of the renaming of data types from OpenMP TR4 to TR6
Patch by Simon Convent
Differential Revision: https://reviews.llvm.org/D39326
llvm-svn: 317435
The TR6 document is expected to be publically released around November 15.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D39182
llvm-svn: 317339
This is part of the renaming of data types from OpenMP TR4 to TR6
Patch by Simon Convent
Differential Revision: https://reviews.llvm.org/D39326
llvm-svn: 317338
The code is tested to work with latest clang, GNU and Intel compiler. The implementation
is optimized for low overhead when no tool is attached shifting the cost to execution with
tool attached.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D38185
llvm-svn: 317085
Removes semicolons after if {} blocks, function definitions, etc.
I was able to apply the large OMPT patch cleanly on top of this one
with no conflicts.
llvm-svn: 314340
Minor code cleanup of Klocwork issues. Fatal messages are given no return
attribute. Define and use KMP_NORETURN to work for multiple C++ versions.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D37275
llvm-svn: 312538
Cleanup code to remove BUILD_TV and unused code bracketed by it.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D36011
llvm-svn: 311114
This change improves the way threads are spread across cores
when OMP_PROC_BIND=spread is set and no unusual affinity masks are in use.
Differential Revision: https://reviews.llvm.org/D36510
llvm-svn: 310670