llvm-project

Author	SHA1	Message	Date
Nikita Popov	4722c6b87c	[openmp] Use core_siblings_list if physical_package_id not available (#111831 ) On powerpc, physical_package_id may not be available. Currently, this causes openmp to fall back to flat topology and various affinity tests fail. Fix this by parsing core_siblings_list to deterimine which cpus belong to the same socket. This matches what the testing code does. The code to parse the CPU list format thankfully already exists. Fixes https://github.com/llvm/llvm-project/issues/111809.	2024-10-14 09:23:41 +02:00
Hansang Bae	5989709047	[OpenMP] Miscellaneous small code improvements (#95603 ) Removes a few uninitialized variables, possible resource leaks, and redundant code.	2024-08-15 10:42:22 -05:00
Alexandre Ganea	20baa9a9ec	[openmp][runtime] Silence warnings This fixes several of those when building with MSVC on Windows: ``` [3625/7617] Building CXX object projects\openmp\runtime\src\CMakeFiles\omp.dir\kmp_affinity.cpp.obj C:\src\git\llvm-project\openmp\runtime\src\kmp_affinity.cpp(2637): warning C4062: enumerator 'KMP_HW_UNKNOWN' in switch of enum 'kmp_hw_t' is not handled C:\src\git\llvm-project\openmp\runtime\src\kmp.h(628): note: see declaration of 'kmp_hw_t' ```	2024-08-11 19:01:12 -04:00
Jonathan Peyton	916a91578f	[OpenMP] Assign thread ids in the cpuinfo topology method (#91013 ) On non-hyperthreaded machines, the thread id is not always explicit in the /proc/cpuinfo file. This patch adds a check to ensure the thread ids are put in.	2024-07-29 09:52:02 -05:00
Jonathan Peyton	77ff969e5d	[OpenMP] Add topology and affinity changes for Meteor Lake (#91012 ) These are Intel-specific changes for the CPUID leaf 31 method for detecting machine topology. * Cleanup known levels usage in x2apicid topology algorithm Change to be a constant mask of all Intel topology type values. * Take unknown ids into account when sorting them If a hardware id is unknown, then put further down the hardware thread list so it will take last priority when assigning to threads. * Have sub ids printed out for hardware thread dump * Add caches to topology New` kmp_cache_ids_t` class helps create cache ids which are then put into the topology table after regular topology type ids have been put in. * Allow empty masks in place list creation Have enumeration information and place list generation take into account that certain hardware threads may be lacking certain layers * Allow different procs to have different number of topology levels Accommodates possible situation where CPUID.1F has different depth for different hardware threads. Each hardware thread has a topology description which is just a small set of its topology levels. These descriptions are tracked to see if the topology is uniform or not. * Change regular ids with logical ids Instead of keeping the original sub ids that the x2apicid topology detection algorithm gives, change each id to its logical id which is a number: [0, num_items - 1]. This makes inserting new layers into the topology significantly simpler. * Insert caches into topology This change takes into account that most topologies are uniform and therefore can use the quicker method of inserting caches as equivalent layers into the topology.	2024-07-29 09:51:42 -05:00
Xing Xue	690c929b6c	[OpenMP][AIX] Use syssmt() to get the number of SMTs per physical CPU (#89985 ) This patch changes to use system call `syssmt()` instead of `lpar_get_info()` to get the number of SMTs (logical processors) per physical processor for AIX. `lpar_get_info()` gives the max number of SMTs that the physical processor can support while `syssmt()` returns the number that is currently configured.	2024-04-26 13:23:33 -04:00
Jonathan Peyton	2ff3850ea1	[OpenMP] Add absolute KMP_HW_SUBSET functionality (#85326 ) Users can put a : in front of KMP_HW_SUBSET to indicate that the specified subset is an "absolute" subset. Currently, when a user puts KMP_HW_SUBSET=1t. This gets translated to KMP_HW_SUBSET="s,c,1t", where * means "use all of". If a user wants only one thread as the entire topology they can now do KMP_HW_SUBSET=:1t. Along with the absolute syntax is a fix for newer machines and making them easier to use with only the 3-level topology syntax. When a user puts KMP_HW_SUBSET=1s,4c,2t on a machine which actually has 4 layers, (say 1s,2m,3c,2t as the entire machine) the user gets an unexpected "too many resources asked" message because KMP_HW_SUBSET currently translates the "4c" value to mean 4 cores per module. To help users out, the runtime can assume that these newer layers, module in this case, should be ignored if they are not specified, but the topology should always take into account the sockets, cores, and threads layers.	2024-04-03 11:43:23 -05:00
Xing Xue	d394f3a162	[OpenMP][AIX] Affinity implementation for AIX (#84984 ) This patch implements `affinity` for AIX, which is quite different from platforms such as Linux. - Setting CPU affinity through masks and related functions are not supported. System call `bindprocessor()` is used to bind a thread to one CPU per call. - There are no system routines to get the affinity info of a thread. The implementation of `get_system_affinity()` for AIX gets the mask of all available CPUs, to be used as the full mask only. - Topology is not available from the file system. It is obtained through system SRAD (Scheduler Resource Allocation Domain). This patch has run through the libomp LIT tests successfully with `affinity` enabled.	2024-03-22 15:25:08 -04:00
MessyHack	ea848d0a6d	[OpenMP] Sort topology after adding processor group layer. (#83943 ) Various behavior around creating affinity masks and detecting uniform topology depends on the topology being sorted. resort topology after adding processor group layer to ensure that the updated topology reflects the newly added processor group info. Observed that the topology was not sorted correctly on high core count AMD Epyc Genoa (2 sockets, 96 cores, 2 threads) using NUMA (NPS 2+).	2024-03-13 16:22:23 -05:00
Jonathan Peyton	f5334f5da5	[OpenMP] Add debug checks for divide by zero (#83300 )	2024-03-12 11:36:19 -07:00
Jonathan Peyton	9b1c496898	[OpenMP] Fixup while loops to avoid bad NULL check (#83302 )	2024-03-11 10:28:12 -05:00
David CARLIER	fa4cc39255	[openmp] adding affinity support to DragonFlyBSD. (#84672 )	2024-03-10 09:56:55 +00:00
David CARLIER	11cd2a33f1	[openmp] porting affinity feature to netbsd. (#84618 ) netbsd supports the portable hwloc's layer as well. for a hardware with 4 cpus, a cpu set is 4 and maxcpus is 256.	2024-03-09 11:45:07 +00:00
Alexandre Ganea	15fdc7646c	Re-land [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853 ) The reverts 94f960925b7f609636fc2ffd83053814d5e45ed1 and fixes it.	2024-01-23 12:48:38 -05:00
Alexandre Ganea	94f960925b	Revert 10f3296dd7d74c975f208a8569221dc8f96d1db1 - [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853 ) It broke the AMDGPU buildbot: https://lab.llvm.org/buildbot/#/builders/193/builds/45378	2024-01-23 08:51:12 -05:00
Alexandre Ganea	10f3296dd7	[openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853 ) There were quite a few compilation warnings when building openmp on Windows with the latest Visual Studios 2022 version 17.8.4. Some other warnings were visible with the latest Clang at tip. This commit fixes all of them.	2024-01-23 08:38:18 -05:00
Alexandre Ganea	0ac992e0ad	[openmp] Revert 64874e5ab5fd102344d43ac9465537a44130bf19 since it was committed by mistake and the PR (https://github.com/llvm/llvm-project/pull/77853 ) wasn't approved yet.	2024-01-18 13:55:03 -05:00
Alexandre Ganea	64874e5ab5	[openmp] Silence warnings when building the LLVM release with MSVC	2024-01-17 07:23:58 -05:00
Jonathan Peyton	5cc603cb22	[OpenMP] Add skewed iteration distribution on hybrid systems (#69946 ) This commit adds skewed distribution of iterations in nonmonotonic:dynamic schedule (static steal) for hybrid systems when thread affinity is assigned. Currently, it distributes the iterations at 60:40 ratio. Consider this loop with dynamic schedule type, for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to performance cores and 12 iterations will be assigned to efficient cores. Each thread with CORE core will process 5 iterations + extras and with ATOM core will process 3 iterations. Differential Revision: https://reviews.llvm.org/D152955	2023-11-08 10:19:37 -06:00
Neale Ferguson	1111ef0257	Add openmp support to System z (#66081 ) * openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get\|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x	2023-11-03 12:42:55 +01:00
Fangrui Song	678e3ee123	[lldb] Fix duplicate word typos; NFC Those fixes were taken from https://reviews.llvm.org/D137338	2023-09-01 21:32:24 -07:00
Jonathan Peyton	99f5969565	[OpenMP] Let primary thread gather topology info for each worker thread This change has the primary thread create each thread's initial mask and topology information so it is available immediately after forking. The setting of mask/topology information is decoupled from the actual binding. Also add this setting of topology information inside the __kmp_partition_places mechanism for OMP_PLACES+OMP_PROC_BIND. Without this, there could be a timing window after the primary thread signals the workers to fork where worker threads have not yet established their affinity mask or topology information. Each worker thread will then bind to the location the primary thread sets. Differential Revision: https://reviews.llvm.org/D156727	2023-08-22 15:56:51 -05:00
Jonathan Peyton	b34c7d8c8e	[OpenMP] Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY * Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API * Add printout of leader to hardware thread dump * Allow OMP_PLACES to restrict fullMask This change fixes an issue with the OMP_PLACES=resource(#) syntax. Before this change, specifying the number of resources did NOT change the default number of threads created by the runtime. e.g., OMP_PLACES=cores(2) would still create __kmp_avail_proc number of threads. After this change, the fullMask and __kmp_avail_proc are modified if necessary so that the final place list dictates which resources are available and how thus, how many threads are created by default. * Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY For OMP_PLACES, two new features are added: 1) OMP_PLACES=cores:<attribute> where <attribute> is either intel_atom, intel_core, or eff# where # is 0 - number of core efficiencies-1. This syntax also supports the optional (#) number selection of resources. 2) OMP_PLACES=core_types\|core_effs where this setting will create the number of core_types (or core_effs\|core_efficiencies). For KMP_AFFINITY, the granularity setting is expanded to include two new keywords: core_type, and core_eff (or core_efficiency). This will set the granularity to include all cores with a particular core type (or efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will create threads which can float across a single core type. Differential Revision: https://reviews.llvm.org/D154547	2023-07-31 13:55:32 -05:00
Jonathan Peyton	1e3bbf76a1	[OpenMP] Re-use affinity raii class in worker spawning Get rid of explicit mask alloc, getthreadaffinity, set temp affinity, reset to old affinity, dealloc steps in favor of existing kmp_affinity_raii_t to push/pop a temporary affinity. Differential Revision: https://reviews.llvm.org/D154650	2023-07-24 15:58:25 -05:00
Jonathan Peyton	05e2bc25e8	[OpenMP] Ensure socket layer is not first in CPUID topology detection * Return 0 length topology if socket layer is detected first * Fix DEBUG ASSERT	2023-07-06 12:35:34 -05:00
Gilles Gouaillardet	3a362a9f38	[OpenMP][libomp] Insert correct HWLOC version guards Put needed HWLOC version guards around relevant HWLOC API. Tested OpenMP host runtime build with HWLOC 1.11.13, 2.0-2.9. Differential Revision: https://reviews.llvm.org/D142152 Fix #54951	2023-01-19 14:30:43 -06:00
Jonathan Peyton	2aea0a9de0	[OpenMP][libomp] Switch Intel topology type values: module, tile According to Software Developer Manual, modules should be value 3 and tile should be value 4.	2023-01-18 12:11:43 -06:00
Jonathan Peyton	f4cce0f47b	[OpenMP][libomp] Add topology information to thread structure Each time a thread gets a new affinity assigned, it will not only assign its mask, but also topology information including which socket, core, thread and core-attributes (if available) it is now assigned. This occurs for all non-disabled KMP_AFFINITY values as well as OMP_PLACES/OMP_PROC_BIND. The information regarding which socket, core, etc. can take on three values: 1) The actual ID of the unit (0 - (N-1)), given N units 2) UNKNOWN_ID (-1) which indicates it does not know which ID 3) MULTIPLE_ID (-2) which indicates the thread is spread across multiple of this unit (e.g., affinity mask is spread across multiple hardware threads) This new information is stored in th_topology_ids[] array. An example how to get the socket Id, one would read th_topology_ids[KMP_HW_SOCKET]. This could be expanded in the future to something more descriptive for the "multiple" case, like a range of values. For now, the single value suffices. The information regarding the core attributes can take on two values: 1) The actual core-type or core-eff 2) KMP_HW_CORE_TYPE_UNKNOWN if the core type is unknown, and UNKNOWN_CORE_EFF (-1) if the core eff is unknown. This new information is stored in th_topology_attrs. An example how to get the core type, one would read th_topology_attrs.core_type. Differential Revision: https://reviews.llvm.org/D139854	2023-01-16 23:04:06 -06:00
gonglingqin	9a0831afa0	[OpenMP] Skip extra blank line when parsing /proc/cpuinfo on LoongArch64 This fixes the following test cases: * affinity/kmp-affinity.c * affinity/kmp-hw-subset.c * affinity/omp-places.c Differential Revision: https://reviews.llvm.org/D139802	2022-12-13 20:13:10 +08:00
Jonathan Peyton	96696b882b	[OpenMP][libomp] Fix disabled affinity Fix setting affinity type and topology method when affinity is disabled and fix places that were not taking into account that affinity can be explicitly disabled by putting proper KMP_AFFINITY_CAPABLE() check. Differential Revision: https://reviews.llvm.org/D137176	2022-11-02 15:37:41 -05:00
Jonathan Peyton	7a9643fd2a	[OpenMP][libomp] Add hidden helper affinity Add new hidden helper affinity via the environment variable, KMP_HIDDEN_HELPER_AFFINITY, which allows users to assign thread affinity to hidden helper threads using the same syntax as KMP_AFFINITY. OMP_PLACES/OMP_PROC_BIND have no interaction with KMP_HIDDEN_HELPER_AFFINITY. Differential Revision: https://reviews.llvm.org/D135113	2022-10-28 15:21:07 -05:00
Jonathan Peyton	b03d67f7f5	[OpenMP][libomp] Make affinity warnings parameterized Separate change for the warnings to depend on the relevant affinity settings verbose and warnings settings. Differential Revision: https://reviews.llvm.org/D135112	2022-10-28 15:21:07 -05:00
Jonathan Peyton	174502fc14	[OpenMP][libomp] Parameterize affinity functions This patch parameterizes the affinity initialization code to allow multiple affinity settings. Almost all global affinity settings are consolidated and put into a structure kmp_affinity_t. This is in anticipation of the addition of hidden helper affinity which will have the same syntax and semantics as KMP_AFFINITY only for the hidden helper team. Differential Revision: https://reviews.llvm.org/D135109	2022-10-28 15:21:06 -05:00
Jonathan Peyton	f8d081c1a5	[OpenMP][libomp] Allow unused-but-set warnings Only a few remaining which are taken care of by this patch. Differential Revision: https://reviews.llvm.org/D133528	2022-10-03 10:24:33 -05:00
Jonathan Peyton	40ce65b5b2	[OpenMP][libomp] Fix affinity warnings and unify under one macro Warnings that occur during affinity initialization are supposed to be guarded by KMP_AFFINITY=nowarnings,noverbose, but some had been missed by this logic. Create one macro for affinity warnings that takes these settings into account. Differential Revision: https://reviews.llvm.org/D125991	2022-07-19 13:10:25 -05:00
AndreyChurbanov	17dcde5f1b	[OpenMP][libomp] Allow reset affinity mask after parallel Added control to reset affinity of primary thread after outermost parallel region to initial affinity encountered before OpenMP runtime was initialized. KMP_AFFINITY environment variable reset/noreset modifier introduced. Default behavior is unchanged. Differential Revision: https://reviews.llvm.org/D125993	2022-07-19 13:05:05 -05:00
Jonathan Peyton	d49ce7c356	[OpenMP][libomp] Replace global variable references with local object Remove references to global __kmp_topology within a kmp_topology_t object method. There should just be implicit references to the private object.	2022-04-12 12:50:41 -05:00
Jonathan Peyton	1234011b80	[OpenMP][libomp] Introduce oneAPI compiler support Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI compiler. Fixup flag detection and compiler ID detection in CMake. Older CMake's detect IntelLLVM as Clang. Fix compiler warnings. Fixup many of the tests to have non-empty parallel regions as they are elided by oneAPI compiler.	2022-02-14 14:10:33 -06:00
Jonathan Peyton	6be7c21b57	[OpenMP][libomp] Replace accidental VLA with KMP_ALLOCA MSVC does not support variable length arrays. Replace with KMP_ALLOCA which is already used in the same file for stack-allocated variables.	2022-02-09 08:09:27 -06:00
Jonathan Peyton	6a556ecaf4	[OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSET This patch allows the user to request all resources of a particular layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified so the number of units requested is optional or can be replaced with an '' character. e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3 e.g., KMP_HW_SUBSET=c:intel_core will use all the big cores e.g., KMP_HW_SUBSET=s,c,1t will use all the sockets, all cores per each socket and 1 thread per core. Differential Revision: https://reviews.llvm.org/D115826	2021-12-20 13:45:21 -06:00
Jonathan Peyton	9769340905	[OpenMP][libomp] Fix compile errors with new KMP_HW_SUBSET changes Add missing guards around x86-specific code. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D115664	2021-12-14 08:33:05 +01:00
Jonathan Peyton	df20599597	[OpenMP][libomp] Add core attributes to KMP_HW_SUBSET Allow filtering of resources based on core attributes. There are two new attributes added: 1) Core Type (intel_atom, intel_core) 2) Core Efficiency (integer) where the higher the efficiency, the more performant the core On hybrid architectures , e.g., Alder Lake, users can specify KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom and first four Big cores. The can also use the efficiency syntax. e.g., KMP_HW_SUBSET=2c:eff0,2c:eff1 Differential Revision: https://reviews.llvm.org/D114901	2021-12-10 14:34:33 -06:00
Peyton, Jonathan L	a733b18bdb	[OpenMP][libomp] Enable HWLOC topology detection of multiple CPU kinds Teach the HWLOC topology method how to detect Atom and Core types so hybrid CPUs are properly detected and represented when using the HWLOC topology method. Differential Revision: https://reviews.llvm.org/D112270	2021-11-17 16:30:18 -06:00
Peyton, Jonathan L	286094af9b	[OpenMP][libomp] Improve Windows Processor Group handling within topology The current implementation of Windows Processor Groups has a separate topology method to handle them. This patch deprecates that specific method and uses the regular CPUID topology method by default and inserts the Windows Processor Group objects in the topology manually. Notes: * The preference for processor groups is lowered to a value less than socket so that the user will see sockets in the KMP_AFFINITY=verbose output instead of processor groups when sockets=processor groups. * The topology's capacity is modified to handle additional topology layers without the need for reallocation. * If a user asks for a granularity setting that is "above" the processor group layer, then the granularity is adjusted "down" to the processor group since this is the coarsest layer available for threads. Differential Revision: https://reviews.llvm.org/D112273	2021-11-17 16:29:01 -06:00
Peyton, Jonathan L	1dd797168e	[OpenMP][libomp] Add support for offline CPUs in Linux If some CPUs are offline, then make sure they are not included in the fullMask even if norespect is given to KMP_AFFINITY. Differential Revision: https://reviews.llvm.org/D112274	2021-11-17 16:28:01 -06:00
Peyton, Jonathan L	a0afb9d0fc	[OpenMP][libomp] Allow users to specify KMP_HW_SUBSET in any order Remove restriction forcing users to specify the KMP_HW_SUBSET value in topology order. This patch sorts the user KMP_HW_SUBSET value before trying to apply it. For example: 1s,4c,2t is equivalent to 2t,1s,4c Differential Revision: https://reviews.llvm.org/D112027	2021-11-17 15:27:37 -06:00
Peyton, Jonathan L	acb3b187c4	[OpenMP][host runtime] Add initial hybrid CPU support Detect, through CPUID.1A, and show user different core types through KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations __kmp_is_hybrid_cpu() to know whether running on a hybrid system or not. Differential Revision: https://reviews.llvm.org/D110435	2021-10-14 16:49:42 -05:00
AndreyChurbanov	52cac541d4	[OpenMP] libomp: cleanup: minor fixes to silence static analyzer. Added couple more checks to silence KlocWork static code analyzer. Differential Revision: https://reviews.llvm.org/D107348	2021-08-16 13:39:23 +03:00
AndreyChurbanov	8b81524c6d	[OpenMP][NFC] libomp: silence warnings on unused variables. Put declarations/definitions of unused variables under corresponding macros to silence clang build warnings. Differential Revision: https://reviews.llvm.org/D106608	2021-07-30 17:04:42 +03:00
Joachim Protze	cff215565e	[OpenMP] Remove unused variables from libomp code Several variables were left unused as a result of different patches removing their use. Two variables have some use: `poll_count` is used by the KMP_BLOCKING macro only under certain conditions. Adding (void) to tell the compiler to ignore the unused variable. `padding` is a dummy stack allocation with no intent to be used. Also adding (void) to make the compiler ignore the unused variable. Differential Revision: https://reviews.llvm.org/D104303	2021-06-16 09:33:46 +02:00

1 2 3 4

151 Commits