llvm-project

Author	SHA1	Message	Date
Shilei Tian	600e0cde3e	[NFC][OpenMP] clang-format `openmp/libomptarget/src/interface.cpp`	2023-09-08 23:01:39 -04:00
Brad Smith	7e31b45d6a	[OpenMP] Use the more appropriate function to retrieve the thread id on OpenBSD (#65553 ) Use the getthrid() function instead of a syscall.	2023-09-07 21:05:25 -04:00
Shilei Tian	010a5a737b	[OpenMP] Fix build issue with `libomp` when OMPT is disabled	2023-09-06 23:40:24 -04:00
Brad Smith	fd4c80dec9	[OpenMP] Fix gettid warnings on DragonFly (#65549 ) Define __kmp_gettid() as appropriate for DragonFly.	2023-09-06 20:21:11 -04:00
Shilei Tian	99d67fb9aa	[OpenMP] Align up the size when calling aligned_alloc (#65525 ) Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the `size` is supposed to be a multiple of `alignment`, and it is implementation defined behavior if not. We have a non-conformant use in `kmp_barrier.h` when allocating distribute barrier. The size of the barrier is 576 and the alignment is `4*CACHE_LINE`, which is 256 on most systems. Apparently it works perfectly fine for Linux and Intel-based Mac, but not for Apple Silicon based Mac. Fix #63194.	2023-09-06 16:28:07 -04:00
Joseph Huber	460840c09d	[OpenMP] Support 'omp_get_num_procs' on the device (#65501 ) Summary: The `omp_get_num_procs()` function should return the amount of parallelism availible. On the GPU, this was not defined. We have elected to define this function as the maximum amount of wavefronts / warps that can be simultaneously resident on the device. For AMDGPU this is the number of CUs multiplied byth CU's per wave. For NVPTX this is the maximum threads per SM divided by the warp size and multiplied by the number of SMs.	2023-09-06 13:45:05 -05:00
Shilei Tian	ff5c7261ef	[OpenMP] Fix a wrong assertion in `__kmp_get_global_thread_id` The function assumes that `__kmp_gtid_get_specific` always returns a valid gtid. That is not always true, because when creating the key for thread-specific data, a destructor is assigned. The dtor will be called at thread exit. However, before the dtor is called, the thread-specific data will be reset to NULL first (https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html): > At thread exit, if a key value has a non-NULL destructor pointer, and the thread > has a non-NULL value associated with that key, the value of the key is set to NULL. This will lead to that `__kmp_gtid_get_specific` returns `KMP_GTID_DNE`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159369	2023-09-06 12:21:43 -04:00
Shilei Tian	518b08c193	[OpenMP] Fix issue of indirect function call in `__kmpc_fork_call_if` (#65436 ) The outlined function is typically invoked by using `__kmp_invoke_microtask`, which is written in asm. D138495 introduces a new interface function for parallel region for OpenMPIRBuilder, where the outlined function is called via the function pointer. For some reason, it works perfectly well on x86 and x86-64 system, but doesn't work on Apple Silicon. The 3rd argument in the callee is always `nullptr`, even if it is not in caller. It appears `x2` always contains `0x0`. This patch adopts the typical method to invoke the function pointer. It works on my M2 Ultra Mac. Fix #63194.	2023-09-06 12:17:45 -04:00
Fangrui Song	678e3ee123	[lldb] Fix duplicate word typos; NFC Those fixes were taken from https://reviews.llvm.org/D137338	2023-09-01 21:32:24 -07:00
Ethan Luis McDonough	2b6ba8c735	[openmp] Tighten flang detection in offloading test This patch ensures that the locally built version of flang when building in-tree. `find_program` sometimes used the wrong executable if a different copy of flang was installed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159161	2023-09-01 13:59:18 -05:00
Martin Storsjö	c2019c416c	[OpenMP] [test] Fix target_thread_limit.cpp to not assume 4 or more cores Previously, the test ran a section with #pragma omp target thread_limit(4) and expected it to execute exactly 4 times, even though it would in practice execute min(cores, 4) times. Increment a counter and check that it executed 1-4 times. Differential Revision: https://reviews.llvm.org/D159311	2023-09-01 21:16:58 +03:00
Jan Leyonberg	a0e3418bc8	[flang][OpenMP] Add fortran test with basic target region This patch adds a test that uses a target region to set a scalar value. It also adds rules in lit.cfg to handle fortran testing. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159216	2023-09-01 09:26:36 -04:00
Shilei Tian	35fdf8d703	[OpenMP] Fix a segment fault in __kmp_get_global_thread_id In `__kmp_get_global_thread_id`, if the gtid mode is 1, after getting the gtid from TLS, it will store the gtid value to the thread stack maintained in the thread descriptor. However, `__kmp_get_global_thread_id` can be called when the library is destructed, after the corresponding thread info has been release. This will cause a segment fault. This can happen on an Intel-based Mac. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159324	2023-08-31 21:15:28 -04:00
Joseph Huber	ccb1d183c3	[OpenMP][Docs] Remove old entry saying static libraries are unsupported Summary: Static libraries have been supported since LLVM 15.0, this entry is misleading and should be removed.	2023-08-30 06:48:57 -05:00
Ethan Luis McDonough	9e3d59e4c2	[openmp] Fix flang detection for offloading test This patch fixes the flang detection in the openmp fortran offloading test. Reviewed By: jsjodin Differential Revision: https://reviews.llvm.org/D158546	2023-08-29 16:31:03 -05:00
Martin Storsjö	81ecc887aa	[OpenMP] Export __kmpc_set_thread_limit on Windows This fixes the new test target/target_thread_limit.cpp on Windows, which was added recently in 08bbff4aad57c70a38d5d2680a61901977e66637 / https://reviews.llvm.org/D152054. Differential Revision: https://reviews.llvm.org/D159070	2023-08-29 23:22:21 +03:00
Saiyedul Islam	f616c3eeb4	[OpenMP][DeviceRTL][AMDGPU] Support code object version 5 Update DeviceRTL and the AMDGPU plugin to support code object version 5. Default is code object version 4. CodeGen for __builtin_amdgpu_workgroup_size generates code for cov4 as well as cov5 if -mcode-object-version=none is specified. DeviceRTL compilation passes this argument via Xclang option to generate abi-agnostic code. Generated code for the above builtin uses a clang control constant "llvm.amdgcn.abi.version" to branch on the abi version, which is available during linking of user's OpenMP code. Load of this constant gets eliminated during linking. AMDGPU plugin queries the ELF for code object version and then prepares various implicitargs accordingly. Differential Revision: https://reviews.llvm.org/D139730 Reviewed By: jhuber6, yaxunl	2023-08-29 06:35:44 -05:00
Anton Rydahl	c1b5674fbb	[OpenMP] Change OpenMP default version in documentation and help text for -fopenmp-version As discussed on the weekly OpenMP meeting on the second of August 2023, the default version in the OpenMP documentation shoud be changed from OpenMP 5.0 to 5.1. Differential Revision: https://reviews.llvm.org/D156901	2023-08-28 19:05:55 -07:00
Doru Bercea	2102ed0b91	Fix for openmp tests honoring thread_limit. Diff: https://reviews.llvm.org/D159001	2023-08-28 13:04:17 -04:00
Doru Bercea	5fe6f56563	Disable intermittently failing OpenMP test. Diff: https://reviews.llvm.org/D159003	2023-08-28 12:56:22 -04:00
Doru Bercea	41bb5ef11f	Add passing test for issue 64797.	2023-08-28 09:55:56 -04:00
Joachim Jenke	1880d8f5c1	[OpenMP][Archer] Add support for taskwait depend At the moment Archer segfaults due to a null-pointer access, if an application uses taskwait with depend clause as used in the two new tests. This patch cleans up the task_schedule function, moves semantic blocks into functions and replaces the if blocks by a single switch statement. The switch statement will warn, when new enum values are added in OMPT and makes clear what code is executed for the different cases. With free-agent tasks coming up in OpenMP 6.0, we should expect more null-pointer task_data, so additional null-pointer checks were added. We also cannot rely on having an implicit task on the stack, so the BarrierIndex is stored during task creation. Differential Revision: https://reviews.llvm.org/D158072	2023-08-28 09:43:24 +02:00
Joachim Jenke	cec855af3e	[OpenMP][OMPT] Fix ompt_get_task_memory implementation Since td_allow_completion_event is a member of the taskdata struct, not all firstprivate/shared variables are stored at the end of the task memory allocation. Simply report the whole allocation instead. Furthermore, the function should always return 0 since in no case there is another block to report. Differential Review: https://reviews.llvm.org/D158080	2023-08-28 09:19:52 +02:00
Sandeep Kosuri	08bbff4aad	[OpenMP] Codegen support for thread_limit on target directive for host offloading - This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5] - The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region. Differential Revision: https://reviews.llvm.org/D152054	2023-08-26 22:18:49 -05:00
Shilei Tian	fbcce33706	[OpenMP] Honor `thread_limit` value when choosing grid size D152014 introduced an optimization that favors more smaller blocks over fewer larger blocks, even if user sets `thread_limit` explicitly. This patch changes the behavior to honor user value. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D158802	2023-08-26 22:17:49 -04:00
Joseph Huber	aa78e94b0b	[Libomptarget] Support mapping indirect host calls to device functions The changes in D157738 allowed for us to emit stub globals on the device in the offloading entry section. These globals contain the addresses of device functions and allow us to map host functions to their corresponding device equivalent. This patch provides the initial support required to build a table on the device to lookup the associated value. This is done by finding these entries and creating a global table on the device that can be searched with a simple binary search. This requires an allocation, which supposedly should be automatically freed at plugin shutdown. This includes a basic test which looks up device pointers via a host pointer using the added function. This will need to be built upon to provide full support for these calls in the runtime. To support reverse offloading it would also be useful to provide a reverse table that allows us to get host functions from device stubs. Depends on D157738 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D157918	2023-08-25 18:51:56 -05:00
Michael Halkenhaeuser	9300b6de3c	[OpenMP][OMPT] Add OMPT support for `generic-elf-64bit` plugin Fixes: https://github.com/llvm/llvm-project/issues/64487 Connect OMPT during plugin initialization and enable corresponding tests. Avoid linking OMPT when corresponding support is disabled. Depends on D158542 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D158543	2023-08-25 13:53:11 -04:00
Johannes Doerfert	a01398156a	[OpenMPOpt][FIX] Ensure to propagate information about parallel regions Before, we checked the parallel region only once, and ignored updates in the KernelInfo for the parallel region that happened later. This caused us to think nested parallel sections are not present even if they are, among other things.	2023-08-25 10:46:56 -07:00
Michael Halkenhaeuser	275259eb9a	[OpenMP] Add `getComputeUnitKind` to generic-elf-64bit plugin Make the generic-plugin report a corresponding CU kind -- instead of 'unknown'. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D158542	2023-08-25 07:14:43 -04:00
Johannes Doerfert	d2c37fc4f7	[Attributor][FIX] Avoid dangling stack references in map The old code did not account for new queries during an update, which caused us to leave stack RQIs in the map. We are now explicit about temporary vs non-temporary RQIs. Fixes: https://github.com/llvm/llvm-project/issues/64959	2023-08-24 16:28:10 -07:00
Johannes Doerfert	3611300a32	[OpenMP][FIX] Update tests after D157725	2023-08-24 16:28:10 -07:00
Aaron Jarmusch	1ff0bdb86d	[OpenMP] Fix Slice Duplicate in Profiler Fixed the broken commit - 6579021f02aed021d8cfab808072aa50311e6d12 Fix for the AMDGPU buildbot reported by @jplehr.	2023-08-24 20:52:15 +00:00
Aaron Jarmusch	6579021f02	[OpenMP] Fix Slice Duplicate in Profiler Using LIBOMPTARGET_PROFILER, duplicates are created from timing both Kernel functions and Data update functions. I commented out the duplicate timescope and left them in the targetkernel and the targetdataupdate functions. This way the timescope calls will be closer to the launching of the kernel and the data moving. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D157725	2023-08-24 19:41:37 +00:00
Johannes Doerfert	908ae84351	[OpenMP] Avoid assumptions at the end of a kernel When we used to treat the kernel end as as aligned barrier, assertions at the end made sense. Now, they actually cause problems as the "writes" are not ordered with regards to reads within the kernel. We can simply get rid of them.	2023-08-23 16:11:43 -07:00
Johannes Doerfert	80906ce48d	[OpenMP] Disable early vectorization of loads/stores in the runtime We are having a hard time optimizing some vectorized loads/stores later on which causes this optimization to degrade performance. Differential Revision: https://reviews.llvm.org/D158656	2023-08-23 15:14:14 -07:00
Johannes Doerfert	382b97554d	[OpenMP] Force the parallel abstraction to be inlined This is good for performance and compile time and the indirection (+ switch statements) is nothing that needs to be preserved.	2023-08-23 11:48:18 -07:00
Johannes Doerfert	81a02b0767	[Attributor][NFC] Precommit test	2023-08-23 11:48:18 -07:00
Johannes Doerfert	7481b465ae	[OpenMP] Use default grid value for static grid size If the user did not provide any static clause to override the grid size, we assume the default grid size as upper bound and use it to improve code generation through vendor specific attributes. Fixes: https://github.com/llvm/llvm-project/issues/64816 Differential Revision: https://reviews.llvm.org/D158382	2023-08-23 11:12:03 -07:00
Johannes Doerfert	c5488c8dcc	[OpenMP] Properly set static thread limit (w/o analysis) We used to have two separate implementations to derive the number of threads used in a target region. This lead us to sometimes miss out on user provided thread bounds (num_threads, or thread_limit) when we looked for "constant default values". If we might miss out on the presence of those bounds, we cannot set the thread_limit statically since the runtime will try to honor user input rather than cap it at the "preferred default". This patch replaces the secondary implementation with the primary in a mode that will not emit code but just look for the presence, and potentially upper bounds, of thread limiting clauses. The runtime test would not pass without this rewrite as we missed some clauses, set the static limit on the device to the preferred value, but then violated that value at runtime. Fixes: https://github.com/llvm/llvm-project/issues/64845 Differential Revision: https://reviews.llvm.org/D158381	2023-08-23 11:12:03 -07:00
Vadim Paretsky	6789dda762	[OpenMP] make small memory allocations in loop collapse code on the stack A few places in the loop collapse support code make small dynamic allocations that introduce a noticeable performance overhead when made on the heap. This change moves allocations up to 32 bytes to the stack instead of the heap. Differential Revision: https://reviews.llvm.org/D158220	2023-08-23 10:37:45 -07:00
Jonathan Peyton	99f5969565	[OpenMP] Let primary thread gather topology info for each worker thread This change has the primary thread create each thread's initial mask and topology information so it is available immediately after forking. The setting of mask/topology information is decoupled from the actual binding. Also add this setting of topology information inside the __kmp_partition_places mechanism for OMP_PLACES+OMP_PROC_BIND. Without this, there could be a timing window after the primary thread signals the workers to fork where worker threads have not yet established their affinity mask or topology information. Each worker thread will then bind to the location the primary thread sets. Differential Revision: https://reviews.llvm.org/D156727	2023-08-22 15:56:51 -05:00
Michael Halkenhaeuser	57f0bdc8fb	[OpenMP][OMPT] Fix `target enter data` callback ordering & reported device num This patch fixes: https://github.com/llvm/llvm-project/issues/64738 We observed multiple issues, primarily that the `DeviceId` was reported as -1 in certain scenarios. The reason for this is simply that the device is not initialized at that point. Hence, we need to move the RAII object creation just after the `checkDeviceAndCtors`, closer to the actual call we want to observe. This also solves an odering issue where one `target enter data` callback would be executed before the `Init` callback. Additionally, this change will also fix that the callbacks corresponding to `enter / exit data` and `update` in conjunction with `nowait` would not result in the emission of an OMPT callback. Added a testcase to cover initialized device number and `omp target` constructs. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D157605	2023-08-22 13:12:09 -04:00
Kazu Hirata	11e2975810	Fx typos in documentation	2023-08-18 23:36:04 -07:00
Johannes Doerfert	9c08e76f3e	[Attributor] Introduce AAIndirectCallInfo AAIndirectCallInfo will collect information and specialize indirect call sites. It is similar to our IndirectCallPromotion but runs as part of the Attributor (so with assumed callee information). It also expands more calls and let's the rest of the pipeline figure out what is UB, for now. We use existing call promotion logic to improve the result, otherwise we rely on the (implicit) function pointer cast. This effectively "fixes" #60327 as it will undo the type punning early enough for the inliner to work with the (now specialized, thus direct) call. Fixes: https://github.com/llvm/llvm-project/issues/60327	2023-08-18 16:44:05 -07:00
Terry Wilmarth	f0221fb1d7	[OpenMP] Add option to use different units for blocktime This change adds the option of using different units for blocktimes specified via the KMP_BLOCKTIME environment variable. The parsing of the environment now recognizes units suffixes: ms and us. If a units suffix is not specified, the default unit is ms. Thus default behavior is still the same, and any previous usage still works the same. Internally, blocktime is now converted to microseconds everywhere, so settings that exceed INT_MAX in microseconds are considered "infinite". kmp_set/get_blocktime are updated to use the units the user specified with KMP_BLOCKTIME, and if not specified, ms are used. Added better range checking and inform messages for the two time units. Large values of blocktime for default (ms) case (beyond INT_MAX/1000) are no longer allowed, but will autocorrect with an INFORM message. The delay for determining ticks per usec was lowered. It is now 1 million ticks which was calculated as ~450us based on 2.2GHz clock which is pretty typical base clock frequency on X86: (1e6 Ticks) / (2.2e9 Ticks/sec) * (1e6 usec/sec) = 454 usec Really short benchmarks can be affected by longer delay. Update KMP_BLOCKTIME docs. Portions of this commit were authored by Johnny Peyton. Differential Revision: https://reviews.llvm.org/D157646	2023-08-18 14:01:13 -05:00
Johannes Doerfert	5eb7a427b0	[Attributor][NFC] Precommit tests	2023-08-17 22:42:38 -07:00
Johannes Doerfert	4fcd5f93d6	[OpenMPOpt] Mark more runtime functions as SPMD compatible Fixes: https://github.com/llvm/llvm-project/issues/64421	2023-08-17 18:33:24 -07:00
Joseph Huber	5717329f1a	[Libomptarget] Disable deadlocking bug49334.cpp test on AMDGPU This test hangs on AMDGPU sporadically, disable it for the time being. Fixes: https://github.com/llvm/llvm-project/issues/64733 Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D158082	2023-08-16 10:24:00 -05:00
Michael Halkenhaeuser	41f3626f8b	[OpenMP][OMPT] Fix reported target pointer for data alloc callback This patch fixes: https://github.com/llvm/llvm-project/issues/64671 DataOp EMI callbacks would not report the correct target pointer. This is now alleviated by passing a `void**` into the function which emits the actual callback, then evaluating that pointer. Note: Since this is only done after the pointer has been properly updated, only `endpoint=2` callbacks will show a non-null value. Reviewed By: dhruvachak, jdoerfert Differential Revision: https://reviews.llvm.org/D157996	2023-08-16 06:39:10 -04:00
Ye Luo	1c822e1e82	[libomptarget] Avoid unintialized GenericPluginTy::NumDevices	2023-08-13 00:01:50 -05:00

1 2 3 4 5 ...

2955 Commits