llvm-project

Author	SHA1	Message	Date
Michael Halkenhäuser	d36f66b42d	[NFC][offload][OMPT] Cleanup of OMPT internals (#109005 ) Removed `OmptCallbacks.cpp` since relevant contents were duplicated. Because of the static linking there should be no change in functionality.	2024-09-23 11:58:40 +02:00
Johannes Doerfert	08533a3ee8	[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer (#100280 ) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.	2024-09-05 13:36:26 -07:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Johannes Doerfert	3b7611594f	[Offload] Improve error reporting on memory faults (#104254 ) Since we can already track allocations, we can diagnose memory faults to some degree. If the fault happens in a prior allocation (use after free) or "close but outside" one, we can provide that information to the user. Note that the fault address might be page aligned, and not all accesses trigger a fault, especially for allocations that are backed by a MemoryManager. Still, if people disable the MemoryManager or the allocation is big enough, we can sometimes provide valueable feedback.	2024-08-21 10:01:35 -07:00
estewart08	ea8bb4d633	[offload] - Fix issue with standalone debug offload build (#104647 ) Error: CommandLine Error: Option 'attributor-manifest-internal' registered more than once During the standalone debug build of offload the above error is seen at app runtime when using a prebuilt llvm with LLVM_LINK_LLVM_DYLIB=ON. This is caused by linking both libLLVM.so and various archives that are found via llvm_map_components_to_libnames for jit support.	2024-08-19 17:59:21 -05:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
Johannes Doerfert	f3bfc56327	[Offload][OpenMP] Prettify error messages by "demangling" the kernel name (#101400 ) The kernel names for OpenMP are manually mangled and not ideal when we report something to the user. We demangle them now, providing the function and line number of the target region, together with the actual kernel name.	2024-08-01 15:24:15 -07:00
Johannes Doerfert	9a1013220b	[Offload] Allow to record kernel launch stack traces (#100472 ) Similar to (de)allocation traces, we can record kernel launch stack traces and display them in case of an error. However, the AMD GPU plugin signal handler, which is invoked on memroy faults, cannot pinpoint the offending kernel. Insteade print `<NUM>`, set via `OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The recoding/record uses a ring buffer of fixed size (for now 8). For `trap` errors, we print the actual kernel name, and trace if recorded.	2024-07-31 11:49:50 -07:00
Johannes Doerfert	c95abe94ae	[Offload] Implement double free (and other allocation error) reporting (#100261 ) As a first step towards a GPU sanitizer we now can track allocations and deallocations in order to report double frees, and other problems during deallocation.	2024-07-30 10:10:57 -07:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
Tim Gymnich	597d2f7662	[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts (#89239 ) Sometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations. Alternatives considered: Make `DefaultNumBlocks` settable via an environment variable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-06-14 07:35:23 -07:00
Johannes Doerfert	54b5c76d3b	[Offload] Use flat array for cuLaunchKernel (#95116 ) We already used a flat array of kernel launch parameters for the AMD GPU launch but now we also use this scheme for the NVIDIA GPU launch. The only remaining/required use of the indirection is the host plugin (due ot ffi). This allows to us simplify the use for non-OpenMP kernel launch.	2024-06-13 09:43:47 +03:00
Joseph Huber	435aa7663d	[Libomptarget] Rework device initialization and image registration (#93844 ) Summary: Currently, we register images into a linear table according to the logical OpenMP device identifier. We then initialize all of these images as one block. This logic requires that images are compatible with all devices instead of just the one that it can run on. This prevents us from running on systems with heterogeneous devices (i.e. image 1 runs on device 0 image 0 runs on device 1). This patch reworks the logic by instead making the compatibility check a per-device query. We then scan every device to see if it's compatible and do it as they come.	2024-06-06 08:10:56 -05:00
Joseph Huber	21f3a6091f	[Offload] Only initialize a plugin if it is needed (#92765 ) Summary: Initializing the plugins requires initializing the runtime like CUDA or HSA. This has a considerable overhead on most platforms, so we should only actually initialize a plugin if it is needed by any image that is loaded.	2024-05-23 09:36:47 -05:00
Joseph Huber	16bb7e89a9	[Offload][NFC] Remove all trailing whitespace from offload/ (#92578 ) Summary: This patch cleans up the training whitespace in a bunch of tests and CMake files. Most just in preparation for other cleanups.	2024-05-17 13:15:04 -05:00
Joseph Huber	c4017cda00	[Offload][NFC] Remove header license in CMake files (#92544 ) Summary: No other project has these in the CMake itself, and they're wildly inconsistent even within the project. These don't really add anything so I think they should be removed.	2024-05-17 09:05:03 -05:00
Joseph Huber	f42f57b52d	[Libomptarget] Rework Record & Replay to be a plugin member (#88928 ) (#89097 ) Summary: Previously, the R&R support was global state initialized by a global constructor. This is bad because it prevents us from adequately constraining the lifetime of the library. Additionally, we want to minimize the amount of global state floating around. This patch moves the R&R support into a plugin member like everything else. This means there will be multiple copies of the R&R implementation floating around, but this was already the case given the fact that we currently handle everything with dynamic libraries.	2024-05-16 14:58:46 -05:00
Joseph Huber	3abd3d6e59	[Libomptarget] Remove requires information from plugin (#80345 ) Summary: Currently this is only used for the zero-copy handling. However, this can easily be moved into `libomptarget` so that we do not need to bother setting the requires flags in the plugin. The advantage here is that we no longer need to do this for every device redundently. Additionally, these requires flags are specifically OpenMP related, so they should live in `libomptarget`.	2024-05-16 11:13:50 -05:00
Joseph Huber	81d20d861e	[Offload][NFC] Fix warning messages in runtime Summary: These are lots of random warnings due to inconsistent initialization or signedness.	2024-05-15 15:30:38 -05:00
Joseph Huber	302db1ab5a	[Offload] Do not link every target for JIT (#92013 ) Summary: The offload library supports basic JIT functionality, however we currently link against every single target even though only AMDGPU and NVPTX are supported. This somewhat bloats the dynamic library list, so we should constrain it to what's actually used.	2024-05-14 13:57:34 -05:00
Joseph Huber	363258a3cc	[Offload] Remove old references to `isCtor` (#91766 ) Summary: These have long since been removed, support for ctors / dtors now happens through special kernels the backend creates.	2024-05-14 06:00:34 -05:00
Joseph Huber	fa9e90f5d2	[Reland][Libomptarget] Statically link all plugin runtimes (#87009 ) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868	2024-05-09 09:38:22 -05:00
Joseph Huber	e5e66073c3	Revert "[Libomptarget] Statically link all plugin runtimes (#87009 )" Caused failures on build-bots, reverting to investigate. This reverts commit 80f9e814ec896fdc57ee84afad8ac4cb1f8e4627.	2024-05-09 07:05:23 -05:00
Joseph Huber	80f9e814ec	[Libomptarget] Statically link all plugin runtimes (#87009 ) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868	2024-05-09 06:35:54 -05:00
Joseph Huber	e3938f4d71	[Offload] Detect native ELF machine from preprocessor (#91282 ) Summary: This gets the target's corresponding ELF value from the preprocessor. We use this to detect if a given ELF is compatible with the CPU offloading impolementation for OpenMP. Previously we used defitions from CMake, but this is easier for people to understand as there may be new users of this in the future.	2024-05-08 14:36:58 -05:00
Jhonatan Cléto	b438a817bd	[Offload] Fix dataDelete op for TARGET_ALLOC_HOST memory type (#91134 ) Summary: The `GenericDeviceTy::dataDelete` method doesn't verify the `TargetAllocTy` of the of the device pointer. Because of this, it can use the `MemoryManager` to free the ptr. However, the `TARGET_ALLOC_HOST` and `TARGET_ALLOC_SHARED` types are not allocated using the `MemoryManager` in the `GenericDeviceTy::dataAlloc` method. Since the `MemoryManager` uses the `DeviceAllocatorTy::free` operation without specifying the type of the ptr, some plugins may use incorrect operations to free ptrs of certain types. In particular, this bug causes the CUDA plugin to use the `cuMemFree` operation on ptrs of type `TARGET_ALLOC_HOST`, resulting in an unchecked error, as shown in the output snippet of the test `offload/test/api/omp_host_pinned_memory_alloc.c`: ``` omptarget --> Notifying about an unmapping: HstPtr=0x00007c6114200000 omptarget --> Call to llvm_omp_target_free_host for device 0 and address 0x00007c6114200000 omptarget --> Call to omp_get_num_devices returning 1 omptarget --> Call to omp_get_initial_device returning 1 PluginInterface --> MemoryManagerTy:🆓 target memory 0x00007c6114200000. PluginInterface --> Cannot find its node. Delete it on device directly. TARGET CUDA RTL --> Failure to free memory: Error in cuMemFree[Host]: invalid argument omptarget --> omp_target_free deallocated device ptr ``` This patch fixes this by adding the check of the device pointer type before calling the appropriate operation for each type.	2024-05-07 22:21:32 -05:00
Joseph Huber	b07177fb68	[Libomptarget] Rework interface for enabling plugins (#86875 ) Summary: Previously we would build all of the plugins by default and then only load some using the `LIBOMPTARGET_PLUGINS_TO_LOAD` variable. This patch renamed this to `LIBOMPTARGET_PLUGINS_TO_BUILD` and changes whether or not it will include the plugin in CMake. Additionally this patch creates a new `Targets.def` file that allows us to enumerate all of the enabled plugins. This is somewhat different from the old method, and it's done this way for future use that will need to be shared. This follows the same method that LLVM uses for its targets, however it does require adding an extra include path. Depends on https://github.com/llvm/llvm-project/pull/86868	2024-04-29 11:18:37 -05:00
Johannes Doerfert	330d8983d2	[Offload] Move `/openmp/libomptarget` to `/offload` (#75125 ) In a nutshell, this moves our libomptarget code to populate the offload subproject. With this commit, users need to enable the new LLVM/Offload subproject as a runtime in their cmake configuration. No further changes are expected for downstream code. Tests and other components still depend on OpenMP and have also not been renamed. The results below are for a build in which OpenMP and Offload are enabled runtimes. In addition to the pure `git mv`, we needed to adjust some CMake files. Nothing is intended to change semantics. ``` ninja check-offload ``` Works with the X86 and AMDGPU offload tests ``` ninja check-openmp ``` Still works but doesn't build offload tests anymore. ``` ls install/lib ``` Shows all expected libraries, incl. - `libomptarget.devicertl.a` - `libomptarget-nvptx-sm_90.bc` - `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git` - `libomptarget.so` -> `libomptarget.so.18git` Fixes: https://github.com/llvm/llvm-project/issues/75124 --------- Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>	2024-04-22 09:51:33 -07:00

29 Commits