There were quite a few compilation warnings when building openmp on Windows with
the latest Visual Studios 2022 version 17.8.4. Some other warnings were visible
with the latest Clang at tip. This commit fixes all of them.
This commit adds skewed distribution of iterations in
nonmonotonic:dynamic schedule (static steal) for hybrid systems when
thread affinity is assigned. Currently, it distributes the iterations at
60:40 ratio. Consider this loop with dynamic schedule type,
for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware
threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to
performance cores and 12 iterations will be assigned to efficient cores.
Each thread with CORE core will process 5 iterations + extras and with
ATOM core will process 3 iterations.
Differential Revision: https://reviews.llvm.org/D152955
* openmp/README.rst
- Add s390x to those platforms supported
* openmp/libomptarget/plugins-nextgen/CMakeLists.txt
- Add s390x subdirectory
* openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
- Add s390x definitions
* openmp/runtime/CMakeLists.txt
- Add s390x to those platforms supported
* openmp/runtime/cmake/LibompGetArchitecture.cmake
- Define s390x ARCHITECTURE
* openmp/runtime/cmake/LibompMicroTests.cmake
- Add dependencies for System z (aka s390x)
* openmp/runtime/cmake/LibompUtils.cmake
- Add S390X to the mix
* openmp/runtime/cmake/config-ix.cmake
- Add s390x as a supported LIPOMP_ARCH
* openmp/runtime/src/kmp_affinity.h
- Define __NR_sched_[get|set]addinity for s390x
* openmp/runtime/src/kmp_config.h.cmake
- Define CACHE_LINE for s390x
* openmp/runtime/src/kmp_os.h
- Add KMP_ARCH_S390X to support checks
* openmp/runtime/src/kmp_platform.h
- Define KMP_ARCH_S390X
* openmp/runtime/src/kmp_runtime.cpp
- Generate code when KMP_ARCH_S390X is defined
* openmp/runtime/src/kmp_tasking.cpp
- Generate code when KMP_ARCH_S390X is defined
* openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
- Define ITT_ARCH_S390X
* openmp/runtime/src/z_Linux_asm.S
- Instantiate __kmp_invoke_microtask for s390x
* openmp/runtime/src/z_Linux_util.cpp
- Generate code when KMP_ARCH_S390X is defined
* openmp/runtime/test/ompt/callback.h
- Define print_possible_return_addresses for s390x
* openmp/runtime/tools/lib/Platform.pm
- Return s390x as platform and host architecture
* openmp/runtime/tools/lib/Uname.pm
- Set hardware platform value for s390x
This change has the primary thread create each thread's initial mask
and topology information so it is available immediately after
forking. The setting of mask/topology information is decoupled from the
actual binding. Also add this setting of topology information inside the
__kmp_partition_places mechanism for OMP_PLACES+OMP_PROC_BIND.
Without this, there could be a timing window after the primary
thread signals the workers to fork where worker threads have not yet
established their affinity mask or topology information.
Each worker thread will then bind to the location the primary thread
sets.
Differential Revision: https://reviews.llvm.org/D156727
* Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API
* Add printout of leader to hardware thread dump
* Allow OMP_PLACES to restrict fullMask
This change fixes an issue with the OMP_PLACES=resource(#) syntax.
Before this change, specifying the number of resources did NOT change
the default number of threads created by the runtime. e.g.,
OMP_PLACES=cores(2) would still create __kmp_avail_proc number of
threads. After this change, the fullMask and __kmp_avail_proc are
modified if necessary so that the final place list dictates which
resources are available and how thus, how many threads are created by
default.
* Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY
For OMP_PLACES, two new features are added:
1) OMP_PLACES=cores:<attribute> where <attribute> is either
intel_atom, intel_core, or eff# where # is 0 - number of core
efficiencies-1. This syntax also supports the optional (#)
number selection of resources.
2) OMP_PLACES=core_types|core_effs where this setting will create
the number of core_types (or core_effs|core_efficiencies).
For KMP_AFFINITY, the granularity setting is expanded to include two new
keywords: core_type, and core_eff (or core_efficiency). This will set
the granularity to include all cores with a particular core type (or
efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will
create threads which can float across a single core type.
Differential Revision: https://reviews.llvm.org/D154547
Get rid of explicit mask alloc, getthreadaffinity, set temp affinity,
reset to old affinity, dealloc steps in favor of existing
kmp_affinity_raii_t to push/pop a temporary affinity.
Differential Revision: https://reviews.llvm.org/D154650
Each time a thread gets a new affinity assigned, it will not
only assign its mask, but also topology information including
which socket, core, thread and core-attributes (if available)
it is now assigned. This occurs for all non-disabled KMP_AFFINITY
values as well as OMP_PLACES/OMP_PROC_BIND.
The information regarding which socket, core, etc. can take on three
values:
1) The actual ID of the unit (0 - (N-1)), given N units
2) UNKNOWN_ID (-1) which indicates it does not know which ID
3) MULTIPLE_ID (-2) which indicates the thread is spread across
multiple of this unit (e.g., affinity mask is spread across
multiple hardware threads)
This new information is stored in th_topology_ids[] array. An example
how to get the socket Id, one would read th_topology_ids[KMP_HW_SOCKET].
This could be expanded in the future to something more descriptive for
the "multiple" case, like a range of values. For now, the single
value suffices.
The information regarding the core attributes can take on two values:
1) The actual core-type or core-eff
2) KMP_HW_CORE_TYPE_UNKNOWN if the core type is unknown, and
UNKNOWN_CORE_EFF (-1) if the core eff is unknown.
This new information is stored in th_topology_attrs. An example
how to get the core type, one would read
th_topology_attrs.core_type.
Differential Revision: https://reviews.llvm.org/D139854
This fixes the following test cases:
* affinity/kmp-affinity.c
* affinity/kmp-hw-subset.c
* affinity/omp-places.c
Differential Revision: https://reviews.llvm.org/D139802
Fix setting affinity type and topology method when affinity is disabled
and fix places that were not taking into account that affinity can be
explicitly disabled by putting proper KMP_AFFINITY_CAPABLE() check.
Differential Revision: https://reviews.llvm.org/D137176
Add new hidden helper affinity via the environment variable,
KMP_HIDDEN_HELPER_AFFINITY, which allows users to assign thread
affinity to hidden helper threads using the same syntax as
KMP_AFFINITY. OMP_PLACES/OMP_PROC_BIND have no interaction with
KMP_HIDDEN_HELPER_AFFINITY.
Differential Revision: https://reviews.llvm.org/D135113
Separate change for the warnings to depend on the relevant affinity
settings verbose and warnings settings.
Differential Revision: https://reviews.llvm.org/D135112
This patch parameterizes the affinity initialization code to allow multiple
affinity settings. Almost all global affinity settings are consolidated
and put into a structure kmp_affinity_t. This is in anticipation of the
addition of hidden helper affinity which will have the same syntax and
semantics as KMP_AFFINITY only for the hidden helper team.
Differential Revision: https://reviews.llvm.org/D135109
Warnings that occur during affinity initialization are supposed
to be guarded by KMP_AFFINITY=nowarnings,noverbose, but some had been
missed by this logic. Create one macro for affinity warnings that takes
these settings into account.
Differential Revision: https://reviews.llvm.org/D125991
Added control to reset affinity of primary thread after outermost parallel
region to initial affinity encountered before OpenMP runtime was initialized.
KMP_AFFINITY environment variable reset/noreset modifier introduced.
Default behavior is unchanged.
Differential Revision: https://reviews.llvm.org/D125993
Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI
compiler.
Fixup flag detection and compiler ID detection in CMake. Older CMake's
detect IntelLLVM as Clang.
Fix compiler warnings.
Fixup many of the tests to have non-empty parallel regions as they are
elided by oneAPI compiler.
This patch allows the user to request all resources of a particular
layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified
so the number of units requested is optional or can be replaced with an
'*' character.
e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3
e.g., KMP_HW_SUBSET=*c:intel_core will use all the big cores
e.g., KMP_HW_SUBSET=*s,*c,1t will use all the sockets, all cores per
each socket and 1 thread per core.
Differential Revision: https://reviews.llvm.org/D115826
Allow filtering of resources based on core attributes. There are two new
attributes added:
1) Core Type (intel_atom, intel_core)
2) Core Efficiency (integer) where the higher the efficiency, the more
performant the core
On hybrid architectures , e.g., Alder Lake, users can specify
KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom
and first four Big cores. The can also use the efficiency syntax. e.g.,
KMP_HW_SUBSET=2c:eff0,2c:eff1
Differential Revision: https://reviews.llvm.org/D114901
Teach the HWLOC topology method how to detect Atom and Core
types so hybrid CPUs are properly detected and represented when using
the HWLOC topology method.
Differential Revision: https://reviews.llvm.org/D112270
The current implementation of Windows Processor Groups has
a separate topology method to handle them. This patch deprecates
that specific method and uses the regular CPUID topology
method by default and inserts the Windows Processor Group objects
in the topology manually.
Notes:
* The preference for processor groups is lowered to a value less than
socket so that the user will see sockets in the KMP_AFFINITY=verbose
output instead of processor groups when sockets=processor groups.
* The topology's capacity is modified to handle additional topology layers
without the need for reallocation.
* If a user asks for a granularity setting that is "above" the processor
group layer, then the granularity is adjusted "down" to the processor
group since this is the coarsest layer available for threads.
Differential Revision: https://reviews.llvm.org/D112273
If some CPUs are offline, then make sure they are not included in the
fullMask even if norespect is given to KMP_AFFINITY.
Differential Revision: https://reviews.llvm.org/D112274
Remove restriction forcing users to specify the KMP_HW_SUBSET value in
topology order. This patch sorts the user KMP_HW_SUBSET value before
trying to apply it. For example: 1s,4c,2t is equivalent to 2t,1s,4c
Differential Revision: https://reviews.llvm.org/D112027
Detect, through CPUID.1A, and show user different core types through
KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations
__kmp_is_hybrid_cpu() to know whether running on a hybrid system or not.
Differential Revision: https://reviews.llvm.org/D110435
Put declarations/definitions of unused variables under corresponding macros
to silence clang build warnings.
Differential Revision: https://reviews.llvm.org/D106608
Several variables were left unused as a result of different patches removing
their use.
Two variables have some use:
`poll_count` is used by the KMP_BLOCKING macro only under certain conditions.
Adding (void) to tell the compiler to ignore the unused variable.
`padding` is a dummy stack allocation with no intent to be used. Also adding
(void) to make the compiler ignore the unused variable.
Differential Revision: https://reviews.llvm.org/D104303
When on KNL and L2 or Tile layer is detected, manually add
the corresponding layer which is equivalent.
Differential Revision: https://reviews.llvm.org/D102865
When KMP_AFFINITY is set, each worker thread's gtid value is used as an
index into the place list to determine the thread's placement. With hidden
helpers enabled, this gtid value is shifted down leading to unexpected
shifted thread placement. This patch restores the previous behavior by
adjusting the mask index to take the number of hidden helper threads
into account.
Hidden helper threads are given the full initial mask and do not
participate in any of the other affinity mechanisms (place partitioning,
balanced affinity). Their affinity is only printed for debug builds.
Differential Revision: https://reviews.llvm.org/D101882
This patch does the following:
1) Introduce kmp_topology_t as the runtime-friendly structure (the
corresponding global variable is __kmp_topology) to determine the
exact machine topology which can vary widely among current and future
architectures. The current design is not easy to expand beyond the assumed
three layer topology: sockets, cores, and threads so a rework capable of
using the existing KMP_AFFINITY mechanisms is required.
This new topology structure has:
* The depth and types of the topology
* Ratio count for each consecutive level (e.g., number of cores per
socket, number of threads per core)
* Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads)
* Equivalent topology layer map (e.g., Numa domain is equivalent to
socket, L1/L2 cache equivalent to core)
* Whether it is uniform or not
The hardware threads are represented with the kmp_hw_thread_t
structure. This structure contains the ids (e.g., socket 0, core 1,
thread 0) and other information grabbed from the previous Address
structure. The kmp_topology_t structure contains an array of these.
2) Generalize the KMP_HW_SUBSET envirable for the new
kmp_topology_t structure. The algorithm doesn't assume any order with
tiles,numa domains,sockets,cores,threads. Instead it just parses the
envirable, makes sure it is consistent with the detected topology
(including taking into account equivalent layers) and then trims away
the unneeded subset of hardware threads. To enable this, a new
kmp_hw_subset_t structure is introduced which contains a vector of
items (hardware type, number user wants, offset). Any keyword within
__kmp_hw_get_keyword() can be used as a name and can be shortened as
well. e.g.,
KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine.
3) Simplify topology detection functions so they only do the singular
task of detecting the machine's topology. Printing, and all
canonicalizing functionality is now done afterwards. So many lines of
duplicated code are eliminated.
4) Add new ll_caches and numa_domains to OMP_PLACES, and
consequently, KMP_AFFINITY's granularity setting. All the names within
__kmp_hw_get_keyword() are available for use in OMP_PLACES or
KMP_AFFINITY's granularity setting.
5) Simplify and future-proof code where explicit lists of allowed
affinity settings keywords inside if() conditions.
6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method
so equivalent caches could be detected (in particular for the ll_caches
place).
Differential Revision: https://reviews.llvm.org/D100997
Cleanup changes:
- check value read from file;
- remove dead code;
- make unsigned variable to read hexadecimal number to;
- add debug assertion to check ref count.
Differential Revision: https://reviews.llvm.org/D96893
These variables are used only in certain build configurations,
or marked with a todo comment indicating that they should be
used/checked/reported.
Differential Revision: https://reviews.llvm.org/D96582
New affinity patch introduced legitimate sign-compare warnings that
clang doesn't report but GCC-10 does. This removes the warnings by
changing two variables types to unsigned.
Differential Revision: https://reviews.llvm.org/D95818
This patch adds the new algorithm for topology discovery using cpuid
leaf 1f. Only the new die level is detected and integrated into the
current affinity mechanisms including KMP_AFFINITY (granularity level
and compact/scatter algorithm), OMP_PLACES=dies, and KMP_HW_SUBSET.
Differential Revision: https://reviews.llvm.org/D95157
HWLOC 2.0 has numa nodes as separate children and are not in the main
parent/child topology tree anymore. This change takes this into
account. The main topology detection loop in the create_hwloc_map()
routine starts at a hardware thread within the initial affinity mask and
goes up the topology tree setting the socket/core/thread labels
correctly.
This change also introduces some of the more generic changes that the
future kmp_topology_t structure will take advantage of including a
generic ratio & count array (finding all ratios of topology layers like
threads/core cores/socket and finding all counts of each topology
layer), generic radix1 reduction step, generic uniformity check, and
generic printing of topology (en_US.txt)
Differential Revision: https://reviews.llvm.org/D95156
This patch partially prepares the runtime source code to be built with
-Wconversion, which should trigger warnings if any implicit conversions
can possibly change a value. For builds done with icc or gcc, all such
warnings are handled in this patch. clang gives a much longer list of
warnings, particularly for sign conversions, which the other compilers
don't report. The -Wconversion flag is commented into cmake files, but
I'm not going to turn it on. If someone thinks it is important, and wants
to fix all the clang warnings, they are welcome to.
Types of changes made here involve either improving the consistency of types
used so that no conversion is needed, or else performing careful explicit
conversions, when we're sure a problem won't arise.
Patch is a combination of changes by Terry Wilmarth and Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D92942
Introduce new kmp_safe_raii_file_t class with RAII semantics for file
open/close. It is essentially a wrapper around the C-style FILE* object.
This also unifies the way we error report if a file can't be opened.
Differential Revision: https://reviews.llvm.org/D92604
KMP_AFFINITY=norespect was triggering an error because the underlying
process affinity mask was not updated to include the entire machine.
The Windows documentation states that the thread affinities must be
subsets of the process affinity. This patch also moves the printing
(for KMP_AFFINITY=verbose) of whether the initial mask was respected
out of each topology detection function and to one location where the
initial affinity mask is read.
Differential Revision: https://reviews.llvm.org/D92587