108 Commits

Author SHA1 Message Date
Fangrui Song
678e3ee123 [lldb] Fix duplicate word typos; NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 21:32:24 -07:00
Terry Wilmarth
f0221fb1d7 [OpenMP] Add option to use different units for blocktime
This change adds the option of using different units for blocktimes specified via the KMP_BLOCKTIME environment variable. The parsing of the environment now recognizes units suffixes: ms and us. If a units suffix is not specified, the default unit is ms. Thus default behavior is still the same, and any previous usage still works the same. Internally, blocktime is now converted to microseconds everywhere, so settings that exceed INT_MAX in microseconds are considered "infinite".

kmp_set/get_blocktime are updated to use the units the user specified with KMP_BLOCKTIME, and if not specified, ms are used.

Added better range checking and inform messages for the two time units. Large values of blocktime for default (ms) case (beyond INT_MAX/1000) are no longer allowed, but will autocorrect with an INFORM message.

The delay for determining ticks per usec was lowered.  It is now 1 million ticks which was calculated as ~450us based on 2.2GHz clock which is pretty typical base clock frequency on X86:
(1e6 Ticks)  /  (2.2e9 Ticks/sec)  *  (1e6 usec/sec)  =  454 usec
Really short benchmarks can be affected by longer delay.

Update KMP_BLOCKTIME docs.

Portions of this commit were authored by Johnny Peyton.

Differential Revision: https://reviews.llvm.org/D157646
2023-08-18 14:01:13 -05:00
Jonathan Peyton
b34c7d8c8e [OpenMP] Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY
* Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API

* Add printout of leader to hardware thread dump

* Allow OMP_PLACES to restrict fullMask

This change fixes an issue with the OMP_PLACES=resource(#) syntax.
Before this change, specifying the number of resources did NOT change
the default number of threads created by the runtime. e.g.,
OMP_PLACES=cores(2) would still create __kmp_avail_proc number of
threads. After this change, the fullMask and __kmp_avail_proc are
modified if necessary so that the final place list dictates which
resources are available and how thus, how many threads are created by
default.

* Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY

For OMP_PLACES, two new features are added:
  1) OMP_PLACES=cores:<attribute> where <attribute> is either
     intel_atom, intel_core, or eff# where # is 0 - number of core
     efficiencies-1. This syntax also supports the optional (#)
     number selection of resources.
  2) OMP_PLACES=core_types|core_effs where this setting will create
     the number of core_types (or core_effs|core_efficiencies).

For KMP_AFFINITY, the granularity setting is expanded to include two new
keywords: core_type, and core_eff (or core_efficiency). This will set
the granularity to include all cores with a particular core type (or
efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will
create threads which can float across a single core type.

Differential Revision: https://reviews.llvm.org/D154547
2023-07-31 13:55:32 -05:00
Jonathan Peyton
4e680ae5f2 [OpenMP] Move KMP_VERSION printout logic to post-serial-init
Get the KMP_VERSION printout logic out of environment variable file
(kmp_settings.cpp) and move to end of serial initialization where
KMP_SETTINGS and OMP_DISPLAY_ENV are.

Differential Revision: https://reviews.llvm.org/D154652
2023-07-24 16:02:03 -05:00
Nawrin Sultana
50a95e3e6b [OpenMP] Minor improvement in error msg and fixes few coverity reported issues
Differential Revision: https://reviews.llvm.org/D152289
2023-07-05 12:07:51 -05:00
Adrian Munera
028cf8c016 [OpenMP] Implement printing TDGs to dot files
This patch implements the "__kmp_print_tdg_dot" function, that prints a task dependency graph into a dot file containing the tasks and their dependencies.

It is activated through a new environment variable "KMP_TDG_DOT"

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D150962
2023-06-19 08:27:38 -05:00
Hansang Bae
bd46706b1f [OpenMP][libomp] Allow white spaces in OMP_TARGET_OFFLOAD value
Remove heading/trailing white spaces when matching OMP_TARGET_OFFLOAD
value.

Differential Revision: https://reviews.llvm.org/D149890
2023-06-05 17:41:54 -05:00
Chenle Yu
36d4e4c9b5 [OpenMP] Implement task record and replay mechanism
This patch implements the "task record and replay" mechanism.  The idea is to be able to store tasks and their dependencies in the runtime so that we do not pay the cost of task creation and dependency resolution for future executions. The objective is to improve fine-grained task performance, both for those from "omp task" and "taskloop".

The entry point of the recording phase is __kmpc_start_record_task, and the end of record is triggered by __kmpc_end_record_task.

Tasks encapsulated between a record start and a record end are saved, meaning that the runtime stores their dependencies and structures, referred to as TDG, in order to replay them in subsequent executions. In these TDG replays, we start the execution by scheduling all root tasks (tasks that do not have input dependencies), and there will be no involvement of a hash table to track the dependencies, yet tasks do not need to be created again.

At the beginning of __kmpc_start_record_task, we must check if a TDG has already been recorded. If yes, the function returns 0 and starts to replay the TDG by calling __kmp_exec_tdg; if not, we start to record, and the function returns 1.

An integer uniquely identifies TDGs. Currently, this identifier needs to be incremented manually in the source code. Still, depending on how this feature would eventually be used in the library, the caller function must do it; also, the caller function needs to implement a mechanism to skip the associated region, according to the return value of __kmpc_start_record_task.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D146642
2023-05-15 10:00:55 -05:00
Jonathan Peyton
96696b882b [OpenMP][libomp] Fix disabled affinity
Fix setting affinity type and topology method when affinity is disabled
and fix places that were not taking into account that affinity can be
explicitly disabled by putting proper KMP_AFFINITY_CAPABLE() check.

Differential Revision: https://reviews.llvm.org/D137176
2022-11-02 15:37:41 -05:00
Jonathan Peyton
7a9643fd2a [OpenMP][libomp] Add hidden helper affinity
Add new hidden helper affinity via the environment variable,
KMP_HIDDEN_HELPER_AFFINITY, which allows users to assign thread
affinity to hidden helper threads using the same syntax as
KMP_AFFINITY. OMP_PLACES/OMP_PROC_BIND have no interaction with
KMP_HIDDEN_HELPER_AFFINITY.

Differential Revision: https://reviews.llvm.org/D135113
2022-10-28 15:21:07 -05:00
Jonathan Peyton
174502fc14 [OpenMP][libomp] Parameterize affinity functions
This patch parameterizes the affinity initialization code to allow multiple
affinity settings. Almost all global affinity settings are consolidated
and put into a structure kmp_affinity_t. This is in anticipation of the
addition of hidden helper affinity which will have the same syntax and
semantics as KMP_AFFINITY only for the hidden helper team.

Differential Revision: https://reviews.llvm.org/D135109
2022-10-28 15:21:06 -05:00
AndreyChurbanov
17dcde5f1b [OpenMP][libomp] Allow reset affinity mask after parallel
Added control to reset affinity of primary thread after outermost parallel
region to initial affinity encountered before OpenMP runtime was initialized.
KMP_AFFINITY environment variable reset/noreset modifier introduced.
Default behavior is unchanged.

Differential Revision: https://reviews.llvm.org/D125993
2022-07-19 13:05:05 -05:00
Jonathan Peyton
f613e6d19d [OpenMP][libomp] Fix accidental removal of else for core attributes 2022-05-19 14:00:27 -05:00
AndreyChurbanov
c44ba01de7 [OpenMP] libomp: honor passive wait policy requested with tasking
Currently the library ignores requested wait policy in the presence
of tasking. Threads always actively spin. The patch fixes this problem
making the wait policy passive if this explicitly requested by user.

Differential Revision: https://reviews.llvm.org/D123044
2022-05-18 10:04:30 -05:00
AndreyChurbanov
cb1bee4725 [OpenMP] libomp: fix UB when LIBOMP_NUM_HIDDEN_HELPER_THREADS=1.
The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads.
Thus number of worker hidden helper threads corresponds to user request,
main thread of helper team excluded as it does not participate in actual work.
This also fixes divide-by-0 issue in the code.

Fixes #48656

Differential Revision: https://reviews.llvm.org/D119586
2022-02-12 03:00:38 +03:00
Terry Wilmarth
2e02579a76 [OpenMP] Add use of TPAUSE
Add use of TPAUSE (from WAITPKG) to the runtime for Intel hardware,
with an envirable to turn it on in a particular C-state.  Always uses
TPAUSE if it is selected and enabled by Intel hardware and presence of
WAITPKG, and if not, falls back to old way of checking
__kmp_use_yield, etc.

Differential Revision: https://reviews.llvm.org/D115758
2022-01-18 10:14:32 -06:00
Jonathan Peyton
6a556ecaf4 [OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSET
This patch allows the user to request all resources of a particular
layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified
so the number of units requested is optional or can be replaced with an
'*' character.

e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3
e.g., KMP_HW_SUBSET=*c:intel_core will use all the big cores
e.g., KMP_HW_SUBSET=*s,*c,1t will use all the sockets, all cores per
      each socket and 1 thread per core.

Differential Revision: https://reviews.llvm.org/D115826
2021-12-20 13:45:21 -06:00
Jonathan Peyton
9769340905 [OpenMP][libomp] Fix compile errors with new KMP_HW_SUBSET changes
Add missing guards around x86-specific code.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D115664
2021-12-14 08:33:05 +01:00
Jonathan Peyton
df20599597 [OpenMP][libomp] Add core attributes to KMP_HW_SUBSET
Allow filtering of resources based on core attributes. There are two new
attributes added:
1) Core Type (intel_atom, intel_core)
2) Core Efficiency (integer) where the higher the efficiency, the more
   performant the core
On hybrid architectures , e.g., Alder Lake, users can specify
KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom
and first four Big cores. The can also use the efficiency syntax. e.g.,
KMP_HW_SUBSET=2c:eff0,2c:eff1

Differential Revision: https://reviews.llvm.org/D114901
2021-12-10 14:34:33 -06:00
Peyton, Jonathan L
286094af9b [OpenMP][libomp] Improve Windows Processor Group handling within topology
The current implementation of Windows Processor Groups has
a separate topology method to handle them. This patch deprecates
that specific method and uses the regular CPUID topology
method by default and inserts the Windows Processor Group objects
in the topology manually.

Notes:
* The preference for processor groups is lowered to a value less than
  socket so that the user will see sockets in the KMP_AFFINITY=verbose
  output instead of processor groups when sockets=processor groups.
* The topology's capacity is modified to handle additional topology layers
  without the need for reallocation.
* If a user asks for a granularity setting that is "above" the processor
  group layer, then the granularity is adjusted "down" to the processor
  group since this is the coarsest layer available for threads.

Differential Revision: https://reviews.llvm.org/D112273
2021-11-17 16:29:01 -06:00
@t-msn
0808d956c4 [OpenMP] libomp: Fix handling of barrier pattern environment variables
It is better to set all barrier patterns to use "dist" when at least
one environment variable specifies "dist". Otherwise if only one
environment is set to "dist" and others left blank inadvertently,
it would result in mixing dist barrier with default hyper barrier
pattern.

Differential Revision: https://reviews.llvm.org/D112597
2021-11-08 15:01:26 +03:00
Peyton, Jonathan L
50b68a3d03 [OpenMP][host runtime] Add support for teams affinity
This patch implements teams affinity on the host.
The default is spread. A user can specify either spread, close, or
primary using KMP_TEAMS_PROC_BIND environment variable. Unlike
OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a
list of values. The values follow the same semantics under the OpenMP
specification for parallel regions except T is the number of teams in
a league instead of the number of threads in a parallel region.

Differential Revision: https://reviews.llvm.org/D109921
2021-10-14 16:30:28 -05:00
AndreyChurbanov
5e58b63b28 [OpenMP] libomp: fix warning on comparison of integer expressions of different signedness
Replaced macro with global variable of correspondent type.

Differential Revision: https://reviews.llvm.org/D111562
2021-10-13 20:11:47 +03:00
Peyton, Jonathan L
343b9e8590 [OpenMP][host runtime] Introduce kmp_cpuinfo_flags_t to replace integer flags
Store CPUID support flags as bits instead of using entire integers.

Differential Revision: https://reviews.llvm.org/D110091
2021-10-01 11:08:39 -05:00
Martin Storsjö
f5616a981c [OpenMP] Fix the usage of sscanf on MinGW
KMP_SSCANF only evaluates to sscanf_s within
    #if KMP_OS_WINDOWS && KMP_MSVC_COMPAT
so we need to pass the sscanf_s specific parameters within a similar
condition.

Differential Revision: https://reviews.llvm.org/D108196
2021-08-17 21:36:09 +03:00
Peyton, Jonathan L
b4a1f441d9 [OpenMP] Add a few small fixes
* Add comment to help ensure new construct data are added in two places
* Check for division by zero in the loop worksharing code
* Check for syntax errors in parrange parsing

Differential Revision: https://reviews.llvm.org/D105929
2021-08-16 10:02:49 -05:00
Peyton, Jonathan L
6eeb4c1f32 [OpenMP] Fix incorrect parameters to sscanf_s call
On Windows, the documentation states that when using sscanf_s,
each %c and %s specifier must also have additional size parameter.
This patch adds the size parameter in the one place where %c is
used.

Differential Revision: https://reviews.llvm.org/D105931
2021-08-16 09:59:21 -05:00
Terry Wilmarth
d8e4cb9121 [OpenMP] libomp: Add new experimental barrier: two-level distributed barrier
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.

This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers. There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.

The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns. Thus, to turn it on, the following settings are required:

KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist

Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.

Patch fixed for ITTNotify disabled builds and non-x86 builds

Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com>
Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com>

Differential Revision: https://reviews.llvm.org/D103121
2021-07-29 14:09:26 -05:00
Johannes Doerfert
4eb90e893f Revert "[OpenMP] Add Two-level Distributed Barrier"
This reverts commit 25073a4ecfc9b2e3cb76776185e63bfdb094cd98.

This breaks non-x86 OpenMP builds for a while now. Until a solution is
ready to be upstreamed we revert the feature and unblock those builds.
See:
  https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821
and
  https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821

The currently proposed fix (D104788) seems not to be ready yet:
  https://reviews.llvm.org/D104788#2841928
2021-06-29 09:38:27 -05:00
AndreyChurbanov
5dd4d0d46f [OpenMP] libomp: fix dynamic loop dispatcher
Restructured dynamic loop dispatcher code.
Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule:
- eliminated possibility of stealing iterations of the wrong loop when victim
  thread changed its buffer to work on another loop;
- fixed race when victim thread changed its buffer to work in nested parallel;
- eliminated "static" property of the schedule, that is now a single thread can
  execute whole loop.

Differential Revision: https://reviews.llvm.org/D103648
2021-06-22 16:29:01 +03:00
Terry Wilmarth
25073a4ecf [OpenMP] Add Two-level Distributed Barrier
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.

This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers.  There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.

The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns.  Thus, to turn it on, the following settings are required:

KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist

Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.

Differential Revision: https://reviews.llvm.org/D103121
2021-06-16 15:34:55 -05:00
Vignesh Balasubramanian
f61602b0d3 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is the first of seven patches that implements OMPD, a debugging interface to support debugging of OpenMP programs.
It contains support code required in "openmp/runtime" for OMPD implementation.

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100181
2021-06-08 16:44:22 +05:30
Terry Wilmarth
8ec9aa236e [OpenMP] Add experimental nesting mode feature
Nesting mode is a new experimental feature in the OpenMP
runtime. It allows a user to set up nesting for an application in a
way that corresponds to the hardware topology levels on the machine an
application is being run on.  For example, if a machine has 2 sockets,
each with 12 cores, then use of nesting mode could set up an outer
level of nesting that uses 2 threads per parallel region, and an inner
level of nesting that uses 12 threads per parallel region.

Nesting mode is controlled with the KMP_NESTING_MODE environment
variable as follows:

1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var
is set to 1 (the default -- nesting is off, nested parallel regions
are serialized).

2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads
will be assigned for each level discovered in the machine topology;
max-active-levels-var is set to the number of levels discovered.

3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change
or be removed in the future.] Nesting mode is on, and a number of
threads will be assigned for each topology level discovered on the
machine, up to k<=n levels (since there may be fewer than n levels
discovered in the topology), and beyond the kth level, nested parallel
regions will be serialized; NOTE: max-active-levels-var is 1 (the default --
nesting is off, and nested parallel regions are serialized until the
user changes max-active-levels-var.

If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will
override KMP_NESTING_MODE settings for the associated environment
variables. The detected topology may be limited by an affinity mask
setting on the initial thread, or if the user sets KMP_HW_SUBSET. See
also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for
nested parallel regions. Note that this feature only sets numbers of
threads used at nesting levels.  The user should make use of
OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those
threads, if desired.

Differential Revision: https://reviews.llvm.org/D102188
2021-06-04 16:01:11 -05:00
Peyton, Jonathan L
9982f33e2c [OpenMP] Refactor/Rework topology discovery code
This patch does the following:

1) Introduce kmp_topology_t as the runtime-friendly structure (the
corresponding global variable is __kmp_topology) to determine the
exact machine topology which can vary widely among current and future
architectures. The current design is not easy to expand beyond the assumed
three layer topology: sockets, cores, and threads so a rework capable of
using the existing KMP_AFFINITY mechanisms is required.

This new topology structure has:
* The depth and types of the topology
* Ratio count for each consecutive level (e.g., number of cores per
   socket, number of threads per core)
* Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads)
* Equivalent topology layer map (e.g., Numa domain is equivalent to
   socket, L1/L2 cache equivalent to core)
* Whether it is uniform or not

The hardware threads are represented with the kmp_hw_thread_t
structure. This structure contains the ids (e.g., socket 0, core 1,
thread 0) and other information grabbed from the previous Address
structure. The kmp_topology_t structure contains an array of these.

2) Generalize the KMP_HW_SUBSET envirable for the new
kmp_topology_t structure. The algorithm doesn't assume any order with
tiles,numa domains,sockets,cores,threads. Instead it just parses the
envirable, makes sure it is consistent with the detected topology
(including taking into account equivalent layers) and then trims away
the unneeded subset of hardware threads. To enable this, a new
kmp_hw_subset_t structure is introduced which contains a vector of
items (hardware type, number user wants, offset). Any keyword within
__kmp_hw_get_keyword() can be used as a name and can be shortened as
well. e.g.,
KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine.

3) Simplify topology detection functions so they only do the singular
task of detecting the machine's topology. Printing, and all
canonicalizing functionality is now done afterwards. So many lines of
duplicated code are eliminated.

4) Add new ll_caches and numa_domains to OMP_PLACES, and
consequently, KMP_AFFINITY's granularity setting. All the names within
__kmp_hw_get_keyword() are available for use in OMP_PLACES or
KMP_AFFINITY's granularity setting.

5) Simplify and future-proof code where explicit lists of allowed
affinity settings keywords inside if() conditions.

6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method
so equivalent caches could be detected (in particular for the ll_caches
place).

Differential Revision: https://reviews.llvm.org/D100997
2021-05-03 18:00:24 -05:00
Hansang Bae
77dc7b4653 [OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT
Also fixed typo in the verbose message.

Differential Revision: https://reviews.llvm.org/D100414
2021-04-14 07:55:26 -05:00
Hansang Bae
467f39249d [OpenMP] Misc. changes that add or remove pointer/bound checks
-- Added or moved checks to appropriate places.
-- Removed ineffective null check where the pointer is already being
   dereferenced around the code.
-- Initialized variables that can be used without definitions.
-- Added call to dlclose/FreeLibrary in OMPT tool activation.
-- Added a new build compiler definition.

Differential Revision: https://reviews.llvm.org/D98584
2021-03-23 18:55:08 -05:00
Shilei Tian
2df65f87c1 [OpenMP] Fixed a crash in hidden helper thread
It is reported that after enabling hidden helper thread, the program
can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root
cause is explained as follows. Let's say the default `__kmp_threads_capacity` is
`N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset
to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via
`num_threads` clause, we need to expand `__kmp_threads`. In
`__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is `Y`.  If `Y` happens to meet the constraint
`(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.

Here is an example.
```
#include <vector>

int main(int argc, char *argv[]) {
  constexpr const size_t N = 1344;
  std::vector<int> data(N);

#pragma omp parallel for
  for (unsigned i = 0; i < N; ++i) {
    data[i] = i;
  }

#pragma omp parallel for num_threads(N)
  for (unsigned i = 0; i < N; ++i) {
    data[i] += i;
  }

  return 0;
}
```
My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset,
`__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions
hit.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D98838
2021-03-18 18:25:36 -04:00
tlwilmar
97d000cfc6 Added API for "masked" construct via two entrypoints: __kmpc_masked,
and __kmpc_end_masked. The "master" construct is deprecated. Changed
proc-bind keyword from "master" to "primary". Use of both master
construct and master as proc-bind keyword is still allowed, but
deprecated.

Remove references to "master" in comments and strings, and replace
with "primary" or "primary thread". Function names and variables were
not touched, nor were references to deprecated master construct. These
can be updated over time. No new code should refer to master.
2021-03-05 09:29:57 -06:00
Peyton, Jonathan L
8c73be9d86 [OpenMP] Limit number of dispatch buffers
This patch limits the number of dispatch buffers (used for
loop worksharing construct) to between 1 and 4096.

Differential Revision: https://reviews.llvm.org/D96749
2021-02-22 13:14:28 -06:00
Shilei Tian
309b00a42e [OpenMP][NFC] clang-format the whole openmp project
Same script as D95318. Test files are excluded.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D97088
2021-02-20 12:46:32 -05:00
AndreyChurbanov
d7b12004bd [OpenMP] libomp: implement nteams-var and teams-thread-limit-var ICVs
The change includes OMP_NUM_TEAMS, OMP_TEAMS_THREAD_LIMIT env variables,
omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit,
omp_get_teams_thread_limit routines.

Differential Revision: https://reviews.llvm.org/D95003
2021-02-01 22:54:11 +03:00
Jonathan Peyton
67773681c0 [OpenMP] Add environment variable to force monotonic dynamic scheduling
This patch introduces a new environment variable to force monotonic
behavior for users that absolutely need it.  This is in anticipation
of 5.0 change that uses non-monotonic behavior for dynamic scheduling
by default. Fixes for that and the actual switch are coming soon.

Differential Revision: https://reviews.llvm.org/D95263
2021-01-29 12:23:27 -06:00
AndreyChurbanov
7f5ad0e071 [OpenMP] libomp: fix build by cl with vs2019
Replace VLA with dynamic allocation using alloca().
This fixes https://bugs.llvm.org/show_bug.cgi?id=48919.

Differential Revision: https://reviews.llvm.org/D95627
2021-01-29 13:16:41 +03:00
Peyton, Jonathan L
8e67134364 [OpenMP] Fix misleading warning for OMP_PLACES
When OMP_PLACES contains an invalid value, the warning informs the user
that the fallback is OMP_PLACES=threads, but the actual internal setting
is OMP_PLACES=cores and is detected as such with KMP_SETTINGS=1.
This patch informs the user that OMP_PLACES=cores is being used instead
of OMP_PLACES=threads.

Differential Revision: https://reviews.llvm.org/D95170
2021-01-27 14:27:24 -06:00
Peyton, Jonathan L
598c590b3c [OpenMP] Add cpuid leaf 1f topology discovery
This patch adds the new algorithm for topology discovery using cpuid
leaf 1f.  Only the new die level is detected and integrated into the
current affinity mechanisms including KMP_AFFINITY (granularity level
and compact/scatter algorithm), OMP_PLACES=dies, and KMP_HW_SUBSET.

Differential Revision: https://reviews.llvm.org/D95157
2021-01-27 14:27:23 -06:00
Nawrin Sultana
927af4b3c5 [OpenMP] Modify OMP_ALLOCATOR environment variable
This patch sets the def-allocator-var ICV based on the environment variables
provided in OMP_ALLOCATOR. Previously, only allowed value for OMP_ALLOCATOR
was a predefined memory allocator. OpenMP 5.1 specification allows predefined
memory allocator, predefined mem space, or predefined mem space with traits in
OMP_ALLOCATOR. If an allocator can not be created using the provided environment
variables, the def-allocator-var is set to omp_default_mem_alloc.

Differential Revision: https://reviews.llvm.org/D94985
2021-01-26 18:27:39 -06:00
Shilei Tian
9d64275ae0 [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-25 22:16:17 -05:00
AndreyChurbanov
a60bc55c69 [OpenMP] libomp: cleanup parsing of OMP_ALLOCATOR env variable.
Differential Revision: https://reviews.llvm.org/D94932
2021-01-19 16:21:22 +03:00
Shilei Tian
9bf843bdc8 Revert "[OpenMP] Added the support for hidden helper task in RTL"
This reverts commit ed939f853da1f2266f00ea087f778fda88848f73.
2021-01-18 06:57:52 -05:00
Shilei Tian
ed939f853d [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-16 14:13:35 -05:00