llvm-project

Author	SHA1	Message	Date
Leandro Lacerda	eed5f06ae8	[Offload][Conformance] Add randomized tests for single-precision bivariate math functions (#154663 ) This patch adds a new set of randomized conformance tests for single-precision bivariate math functions. The functions included in this set were selected based on the following criteria: - An implementation exists in `libc/src/math/generic` (i.e., it is not just a wrapper around a compiler built-in). - The corresponding LLVM CPU libm implementation is correctly rounded. - The function is listed in Table 65 of the OpenCL C Specification v3.0.19.	2025-08-21 11:27:25 -05:00
Ross Brunton	2e74cc6c04	[Offload][NFC] Use a sensible order for APIGen (#154518 ) The order entries in the tablegen API files are iterated is not the order they appear in the file. To avoid any issues with the order changing in future, we now generate all definitions of a certain class before class that can use them. This is a NFC; the definitions don't actually change, just the order they exist in in the OffloadAPI.h header.	2025-08-21 09:38:21 +01:00
Ross Brunton	273ca1f77b	[Offload] Fix `OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE` on AMD (#154521 ) This wasn't handled with the normal info API, so needs special handling.	2025-08-21 09:37:58 +01:00
Dominik Adamski	b69fd34e76	[Offload] Add oneInterationPerThread param to loop device RTL (#151959 ) Currently, Flang can generate no-loop kernels for all OpenMP target kernels in the program if the flags -fopenmp-assume-teams-oversubscription or -fopenmp-assume-threads-oversubscription are set. If we add an additional parameter, we can choose in the future which OpenMP kernels should be generated as no-loop kernels. This PR doesn't modify current behavior of oversubscription flags. RFC for no-loop kernels: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517	2025-08-21 09:03:56 +02:00
Leandro Lacerda	8d7b50e572	[Offload][Conformance] Add `RandomGenerator` for large input spaces (#154252 ) This patch implements the `RandomGenerator`, a new input generator that enables conformance testing for functions with large input spaces (e.g., double-precision math functions). Architectural Refactoring To support different generation strategies in a clean and extensible way, the existing `ExhaustiveGenerator` was refactored into a new class hierarchy: * A new abstract base class, `RangeBasedGenerator`, was introduced using the Curiously Recurring Template Pattern (CRTP). It contains the common logic for generators that operate on a sequence of ranges. * `ExhaustiveGenerator` now inherits from this base class, simplifying its implementation. New Components * The new `RandomGenerator` class also inherits from `RangeBasedGenerator`. It implements a strategy that randomly samples a specified number of points from the total input space. * Random number generation is handled by a new, self-contained `RandomState` class (a `xorshift64` PRNG seeded with `splitmix64`) to ensure deterministic and reproducible random streams for testing. Example Usage* As a first use case and demonstration of this new capability, this patch also adds the first double-precision conformance test for the `log` function. This test uses the new `RandomGenerator` to validate the implementations from the `llvm-libm`, `cuda-math`, and `hip-math` providers.	2025-08-20 13:37:01 -05:00
David Tenty	63195d3d7a	[NFC][CMake] quote ${CMAKE_SYSTEM_NAME} consistently (#154537 ) A CMake change included in CMake 4.0 makes `AIX` into a variable (similar to `APPLE`, etc.) `ff03db6657` However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to `AIX` and `if` auto-expands variable names in CMake. That means you get a double expansion if you write: `if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")` which becomes: `if (AIX MATCHES "AIX")` which is as if you wrote: `if (ON MATCHES "AIX")` You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}", due to policy [CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054) which is on by default in 4.0+. Most of the LLVM CMake already does this, but this PR fixes the remaining cases where we do not.	2025-08-20 12:45:41 -04:00
Ross Brunton	c8986d1ecb	[Offload] Guard olMemAlloc/Free with a mutex (#153786 ) Both these functions update an `AllocInfoMap` structure in the context, however they did not use any locks, causing random failures in threaded code. Now they use a mutex.	2025-08-20 13:23:57 +01:00
Ross Brunton	2c11a83691	[Offload] Add olCalculateOptimalOccupancy (#142950 ) This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>	2025-08-19 15:16:47 +01:00
Rafal Bielski	9c9d9e4cb6	[Offload] Define additional device info properties (#152533 ) Add the following properties in Offload device info: * VENDOR_ID * NUM_COMPUTE_UNITS * [SINGLE\|DOUBLE\|HALF]_FP_CONFIG * NATIVE_VECTOR_WIDTH_[CHAR\|SHORT\|INT\|LONG\|FLOAT\|DOUBLE\|HALF] * MAX_CLOCK_FREQUENCY * MEMORY_CLOCK_RATE * ADDRESS_BITS * MAX_MEM_ALLOC_SIZE * GLOBAL_MEM_SIZE Add a bitfield option to enumerators, allowing the values to be bit-shifted instead of incremented. Generate the per-type enums using `foreach` to reduce code duplication. Use macros in unit test definitions to reduce code duplication.	2025-08-19 13:02:01 +01:00
Akash Banerjee	6aafe6582d	Fix test added in 1fd1d634630754cc9b9c4b5526961d5856f64ff9	2025-08-18 13:29:23 +01:00
Abhinav Gaba	12769aa728	[Offload] Introduce ATTACH map-type support for pointer attachment. (#149036 ) This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 sizeof(p[1]), TO \| FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, these two possible maps could be emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO \| FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO \| FROM \| PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi! --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>	2025-08-17 15:17:04 -07:00
Abhinav Gaba	79cf877627	[Offload] Introduce dataFence plugin interface. (#153793 ) The purpose of this fence is to ensure that any `dataSubmit`s inserted into a queue before a `dataFence` finish before finish before any `dataSubmit`s inserted after it begin. This is a no-op for most queues, since they are in-order, and by design any operations inserted into them occur in order. But the interface is supposed to be functional for out-of-order queues. The addition of the interface means that any operations that rely on such ordering (like ATTACH map-type support in #149036) can invoke it, without worrying about whether the underlying queue is in-order or out-of-order. Once a plugin supports out-of-order queues, the plugin can implement this function, without requiring any change at the libomptarget level. --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>	2025-08-15 11:49:35 -07:00
Akash Banerjee	1fd1d63463	[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048 ) Add a new AutomapToTargetData pass. This gathers the declare target enter variables which have the AUTOMAP modifier. And adds omp.declare_target_enter/exit mapping directives for fir.alloca and fir.free oeprations on the AUTOMAP enabled variables. Automap Ref: OpenMP 6.0 section 7.9.7.	2025-08-15 15:41:41 +01:00
Ross Brunton	30c7951136	[Offload] `olLaunchHostFunction` (#152482 ) Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.	2025-08-15 09:39:48 +01:00
Abhinav Gaba	2912c9c249	[NFC][Offload] Add missing maps to OpenMP offloading tests. (#153103 ) A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an `int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be mapped, which is incorrect. This change fixes six such tests.	2025-08-14 12:22:28 -07:00
Ross Brunton	3e9f29cfee	[Offload] Store globals in the program's global list rather than the kernel list (#153441 )	2025-08-13 17:18:25 +01:00
Callum Fare	aa6f591b63	[Offload] Implement hasPendingWork on CUDA (#152728 ) Following on from #152304, implement the new query in the CUDA plugin	2025-08-13 16:35:23 +01:00
Akash Banerjee	1c7720ef78	Revert "[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048 )" This reverts commit 4e6d510eb3ec5b5e5ea234756ea1f0b283feee4a.	2025-08-12 20:19:45 +01:00
Akash Banerjee	4e6d510eb3	[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048 ) Add a new AutomapToTargetData pass. This gathers the declare target enter variables which have the AUTOMAP modifier. And adds omp.declare_target_enter/exit mapping directives for fir.alloca and fir.free oeprations on the AUTOMAP enabled variables. Automap Ref: OpenMP 6.0 section 7.9.7.	2025-08-12 15:18:15 +01:00
Amit Tiwari	2074e1320f	[Clang][OpenMP] Non-contiguous strided update (#144635 ) This patch handles the strided update in the `#pragma omp target update from(data[a🅱️c])` directive where 'c' represents the strided access leading to non-contiguous update in the `data` array when the offloaded execution returns the control back to host from device using the `from` clause. Issue: Clang CodeGen where info is generated for the particular `MapType` (to, from, etc), it was failing to detect the strided access. Because of this, the `MapType` bits were incorrect when passed to runtime. This led to incorrect execution (contiguous) in the libomptarget runtime code. Added a minimal testcase that verifies the working of the patch.	2025-08-12 19:32:15 +05:30
Akash Banerjee	0998da27e9	Revert "[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#151989 )" This reverts commit 5a5e8ba0c388d57aecb359ed67919cda429fc7b1.	2025-08-11 13:52:39 +01:00
Akash Banerjee	5a5e8ba0c3	[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#151989 ) Add a new `AutomapToTargetData` pass. This gathers the declare target enter variables which have the `AUTOMAP` modifier. And adds `omp.declare_target_enter/exit` mapping directives for `fir.allocmem` and `fir.freemem` oeprations on the `AUTOMAP` enabled variables. Automap Ref: OpenMP 6.0 section 7.9.7.	2025-08-11 13:18:38 +01:00
Kevin Sala Penades	5751e96f9a	[Offload][NFC] Re-enable clang-format for omptarget.h (#152937 )	2025-08-10 15:46:37 -07:00
Kevin Sala Penades	7de50beb52	[Offload] Fix return error with a condition (#152876 ) Adds a conditional to the error return so that it only returns if there was an error.	2025-08-10 12:03:09 -07:00
Ross Brunton	910d7e90bf	[Offload] Make olLaunchKernel test thread safe (#149497 ) This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).	2025-08-08 10:57:04 +01:00
Ross Brunton	197d1c1570	[Offload] OL_QUEUE_INFO_EMPTY (#152473 ) Add a queue query that (if possible) reports whether the queue is empty	2025-08-08 10:20:45 +01:00
Leandro Lacerda	27ed1f99e2	[Offload][Conformance] Add support for CUDA Math and HIP Math providers (#152362 ) This patch extends the conformance testing infrastructure to support two new providers of math function implementations for GPUs: CUDA Math (`cuda-math`) and HIP Math (`hip-math`).	2025-08-07 18:32:34 -05:00
Ross Brunton	900d20d0dc	[NFC][Offload] Move conformance test warning outside of function (#152466 ) `add_conformance_test` checks for libc and prints a warning if it is not found. However, this warning ends up being printed once for each test, spamming the cmake log. Moving it up to the folder cmake allows it to be reported only once.	2025-08-07 08:14:26 -05:00
Ross Brunton	a44532544b	[Offload] Don't create events for empty queues (#152304 ) Add a device function to check if a device queue is empty. If liboffload tries to create an event for an empty queue, we create an "empty" event that is already complete. This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run quickly for empty queues.	2025-08-07 10:16:33 +01:00
hidekisaito	83e5a99ff6	[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882 ) Enables AMD data center class GPUs to use memory manager memory pooling up to 3GB allocation by default, up from the "1 << 13" threshold that all plugin-nextgen devices use.	2025-08-06 14:41:20 -07:00
Alex Duran	f092b820d1	[OFFLOAD] Fix typo in assert (#152316 ) Fixes an issue introduced by PR https://github.com/llvm/llvm-project/pull/143491.	2025-08-06 17:01:47 +02:00
Alex Duran	66d1c37eb6	[OFFLOAD][OPENMP] 6.0 compatible interop interface (#143491 ) The following patch introduces a new interop interface implementation with the following characteristics: * It supports the new 6.0 prefer_type specification * It supports both explicit objects (from interop constructs) and implicit objects (from variant calls). * Implements a per-thread reuse mechanism for implicit objects to reduce overheads. * It provides a plugin interface that allows selecting the supported interop types, and managing all the backend related interop operations (init, sync, ...). * It enables cooperation with the OpenMP runtime to allow progress on OpenMP synchronizations. * It cleanups some vendor/fr_id mismatchs from the current query routines. * It supports extension to define interop callbacks for library cleanup.	2025-08-06 16:34:39 +02:00
Ross Brunton	ca13c44bbc	[NFC][Offload] Clarify `olDestroyQueue` (#152132 ) This has no code changes.	2025-08-06 15:34:31 +01:00
Joseph Huber	3bc1b15235	[OpenMP] Fix weak linkage on malloc declaration Summary: This being weak forces the external reference to be weak. Either we define it weak or not by pulling it from `libc`. Doing it here causes it to not be extracted properly.	2025-08-05 17:38:59 -05:00
Leandro Lacerda	cd0268063d	[Offload][Conformance] Add tests for single-precision math functions (#152013 ) This patch adds a new set of conformance tests for single-precision math functions provided by the LLVM libm for GPUs. The functions included in this set were selected based on the following criteria: - An implementation exists in `libc/src/math/generic` (i.e., it is not just a wrapper around a compiler built-in). - The corresponding LLVM CPU libm implementation is correctly rounded. - The function is listed in Table 65 of the OpenCL C Specification v3.0.19.	2025-08-05 13:33:20 -05:00
Ross Brunton	d03692a00e	[Offload] Rework `MAX_WORK_GROUP_SIZE` (#151926 ) `MAX_WORK_GROUP_SIZE` now represents the maximum total number of work groups the device can allocate, rather than the maximum per dimension. `MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old behaviour.	2025-08-04 15:21:24 +01:00
Leandro Lacerda	f1eb869bae	[Offload][UnitTests] Build device code as C++ (#151714 ) This commit refactors the `add_offload_test_device_code` CMake function to compile device code using the C++ compiler (`CMAKE_CXX_COMPILER`) instead of the C compiler. This change enables the use of C++ features, such as templates, within device-side test kernels. This will allow for more advanced and reusable kernel wrappers, reducing boilerplate code in the conformance test suite. As part of this change: - All `.c` files for device code in `unittests/` have been renamed to `.cpp`. - Kernel definitions are now wrapped in `extern "C"` to ensure C linkage and prevent name mangling. This change affects the `OffloadAPI` and `Conformance` test suites. cc @callumfare @RossBrunton @jhuber6	2025-08-04 07:00:51 -05:00
Leandro Lacerda	acdc01b039	[Offload][UnitTests] Fix incorrect CUDA path variable in CMake helper (#151820 ) This PR fixes a minor bug in the `add_offload_test_device_code` CMake helper function in `offload/unittests/CMakeLists.txt`. The function was discovering the local CUDA Toolkit path and storing it in the `cuda_path` variable but was then using the incorrect `CUDA_ROOT` variable in the `add_custom_command` call for the NVPTX target. This change corrects the command to use the intended `cuda_path` variable.	2025-08-02 09:09:30 -05:00
Joseph Huber	8934a6e13b	[OpenMP] Use the `libc` malloc for AMDGPU if available (#151241 ) Summary: This patch enables the OpenMP runtime to use the general-purpose `malloc` interface in `libc` if the user built OpenMP with it enabled. All this requires is keeping `malloc` as an external function so it will be resolved later by the linker.	2025-08-01 20:41:06 -05:00
Joachim	a86ad73064	[offload] Add missing build dependency (#149326 ) libc++ headers must be generated before compiling part of liboffload. The build error occurs if clang is configured to use libc++ by default. Fixes issue #149324	2025-07-30 09:44:46 +02:00
Leandro Lacerda	2abd58cb7e	[Offload] Add framework for math conformance tests (#149242 ) This PR introduces the initial version of a C++ framework for the conformance testing of GPU math library functions, building upon the skeleton provided in #146391. The main goal of this framework is to systematically measure the accuracy of math functions in the GPU libc, verifying correctness or at least conformance to standards like OpenCL via exhaustive or random accuracy tests.	2025-07-29 11:08:27 -05:00
Aiden Grossman	2e3fd547de	[Offload] Fix typo in shared_lib_fp_mapping.c Made a typo in 963259ef6be4871e5252ff3ac9df737af5d2b4cb because I cannot run tests and also did not review it. This should fix it...	2025-07-25 23:17:46 +00:00
Aiden Grossman	963259ef6b	[Offload] Remove uses of %T from lit tests (#150721 ) This patch removes all the instances of %T from offload/ (only one test contained this construction). %T has been deprecated for ~7 years and is not reccomended as it does not use a unique directory per test. Switch to using %t to ensure we use a unique dir per test and so that we can eventually remove %T. I did not actually test this. A couple feeble attempts at building/running the offload tests just leaves me with a ton of test failures. Given how small this is I'm reasonably sure it works though.	2025-07-25 16:16:22 -07:00
Ross Brunton	ae44418f28	[Offload] Erase entries from JIT cache when program is destroyed (#148847 ) When `unloadBinary` is called, any entries in the JITEngine's cache for that binary will be cleared. This fixes a nasty issue with liboffload program handles. If two handles happen to have had the same address (after one was free'd, for example), the cache would be hit and return the wrong program.	2025-07-25 16:11:30 +01:00
agozillon	73272d6fc6	[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization (#150311 ) Currently, we return early whenever we've already generated an allocation for intermediate descriptor variables (required in certain cases when we can't directly access the base address of a passes in descriptor function argument due to HLFIR/FIR restrictions). This unfortunately, skips over the presence check and load/store required to set the intermediate descriptor allocations values/data. This is fine in most cases, but if a function happens to have a series of branches with seperate target regions capturing the same input argument, we'd emit the present/load/store into the first branch with the first target inside of it, the secondary (or any preceding) branches would not have the present/load/store, this would lead to the subsequent mapped values in that branch being empty and then leading to a memory access violation on device. The fix for the moment is to emit a present/load/store at the relevant location of every target utilising the input argument, this likely will also lead to fixing possible issues with the input argument being manipulated inbetween target regions (primarily resizing, the data should remain the same as we're just copying an address around, in theory at least). There's possible optimizations/simplifications to emit less load/stores such as by raising the load/store out of the branches when we can, but I'm inclined to leave this sort of optimization to lower level passes such as an LLVM pass (which very possibly already covers it).	2025-07-25 16:15:54 +02:00
Ross Brunton	adb2421202	[Offload] Refactor device information queries to use new tagging (#147318 ) Instead using strings to look up device information (which is brittle and slow), use the new tags that the plugins specify when building the nodes.	2025-07-25 14:51:51 +01:00
Callum Fare	78faf99c4f	[Offload] Fix olWaitEvents tests after change to events API (#150465 ) Fix the olWaitEvents tests after #150217 broke them	2025-07-24 18:35:47 +01:00
Ross Brunton	690c3ee5be	[Offload] Replace "EventOut" parameters with `olCreateEvent` (#150217 ) Rather than having every "enqueue"-type function have an output pointer specifically for an output event, just provide an `olCreateEvent` entrypoint which pushes an event to the queue. For example, replace: ```cpp olMemcpy(Queue, ..., EventOut); ``` with ```cpp olMemcpy(Queue, ...); olCreateEvent(Queue, EventOut); ```	2025-07-24 14:31:06 +01:00
hidekisaito	75e60e745b	[AMDGPU][Offload][LIT] Run unified_shared_memory tests on gfx950 (#150372 ) Enables 9 more tests	2025-07-23 22:46:26 -07:00
Ross Brunton	081b74caf5	[Offload] Add olWaitEvents (#150036 ) This function causes a queue to wait until all the provided events have completed before running any future scheduled work.	2025-07-23 14:12:16 +01:00

1 2 3 4 5 ...

407 Commits