llvm-project

Author	SHA1	Message	Date
Jay Foad	4dd55c567a	[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399 ) Follow up to #109133.	2024-10-24 10:23:40 +01:00
Akash Banerjee	6b1c51bc05	[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343 ) This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions. Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>	2024-06-26 20:18:38 +01:00
Johannes Doerfert	f9a89e6b9c	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752 ) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-11-01 11:11:48 -07:00
Johannes Doerfert	31b91213bd	[OpenMP] Unify the min/max thread/teams pathways We used to pass the min/max threads/teams values through different paths from the frontend to the middle end. This simplifies the situation by passing the values once, only when we will create the KernelEnvironment, which contains the values. At that point we also manifest the metadata, as appropriate. Some footguns have also been removed, e.g., our target check is now triple-based, not calling convention-based, as the latter is dependent on the ordering of operations. The types of the values have been unified to int32_t.	2023-10-29 10:53:20 -07:00
Johannes Doerfert	289a0f255d	[OpenMP] Remove SPMD specific handling during globalization Globalization and SPMD are different things that used to be conflated. Some leftover crossover interactions remain, trying to remove them now.	2023-10-26 14:38:23 -07:00
Shilei Tian	d6254e1b2e	Introduce the initial support for OpenMP kernel language (#66844 ) This patch starts the support for OpenMP kernel language, basically to write OpenMP target region in SIMT style, similar to kernel languages such as CUDA. What included in this first patch is the `ompx_bare` clause for `target teams` directive. When `ompx_bare` exists, globalization is disabled such that local variables will not be globalized. The runtime init/deinit function calls will not be emitted. That being said, almost all OpenMP executable directives are not supported in the region, such as parallel, task. This patch doesn't include the Sema checks for that, so the use of them is UB. Simple directives, such as atomic, can be used. We provide a set of APIs (for C, they are prefix with `ompx_`; for C++, they are in `ompx` namespace) to get thread id, block id, etc. Please refer to https://tianshilei.me/wp-content/uploads/llvm-hpc-2023.pdf for more details.	2023-10-05 17:38:06 -04:00
JP Lehr	1bff5f6d0b	Revert "[OpenMP] Introduce the initial support for OpenMP kernel language (#66844 )" This reverts commit e997dca3333823ffe2ea3aea288299f551532dcd.	2023-09-29 15:35:10 -05:00
Shilei Tian	e997dca333	[OpenMP] Introduce the initial support for OpenMP kernel language (#66844 ) This patch starts the support for OpenMP kernel language, basically to write OpenMP target region in SIMT style, similar to kernel languages such as CUDA. What included in this first patch is the `ompx_bare` clause for `target teams` directive. When `ompx_bare` exists, globalization is disabled such that local variables will not be globalized. The runtime init/deinit function calls will not be emitted. That being said, almost all OpenMP executable directives are not supported in the region, such as parallel, task. This patch doesn't include the Sema checks for that, so the use of them is UB. Simple directives, such as atomic, can be used. We provide a set of APIs (for C, they are prefix with `ompx_`; for C++, they are in `ompx` namespace) to get thread id, block id, etc. For more details, you can refer to https://tianshilei.me/wp-content/uploads/llvm-hpc-2023.pdf.	2023-09-29 13:11:09 -04:00
Sergio Afonso	63ca93c7d1	[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes `IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to `-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed to `omp.is_target_device`. Getters and setters of all these renamed properties are also updated accordingly. Many unit tests have been updated to use the new names, but an alias for the `-fopenmp-is-device` option is created so that external programs do not stop working after the name change. `IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the `-fopenmp-is-target-device` compiler frontend option, which is only added to the OpenMP device invocation for offloading-enabled programs. Differential Revision: https://reviews.llvm.org/D154591	2023-07-10 14:14:16 +01:00
Doru Bercea	13888870e5	Enable dynamic-sized VLAs for data sharing in OpenMP offloaded target regions. Review: https://reviews.llvm.org/D153883	2023-07-06 10:57:10 -04:00
Kazu Hirata	9bf9ac87d6	[CodeGen] Remove unused declaration createNVPTXRuntimeFunction The corresponding definition was removed by: commit 3cc1f1fc1d97952136185f4eafb827694875de17 Author: Joseph Huber <jhuber6@vols.utk.edu> Date: Thu Oct 8 12:03:11 2020 -0400	2023-06-10 21:52:49 -07:00
Kazu Hirata	54ab4b3a28	[CodeGen] Remove unused declarations emitNonSPMDParallelCall and emitSPMDParallelCall The corresponding function definitions were removed by: commit a2dbfb6b72db19ed851464160ef7539b50d43894 Author: Giorgis Georgakoudis <georgakoudis1@llnl.gov> Date: Wed Apr 21 11:41:31 2021 -0700	2023-05-26 20:07:52 -07:00
Itay Bookstein	782c59a4ee	[OpenMP] Prefix outlined and reduction func names with original func's name This patch prefixes omp outlined helpers and reduction funcs with the original function's name. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D140722	2023-04-19 23:00:26 +03:00
Itay Bookstein	6fdd13e0ec	Revert "[OpenMP] Prefix outlined and reduction func names with original func's name" This reverts commit 029bfc311d4d7d3cd90be81bb08c046848796d02.	2023-04-19 19:08:49 +03:00
Itay Bookstein	029bfc311d	[OpenMP] Prefix outlined and reduction func names with original func's name This patch attempts to prefix omp outlined helpers and reduction funcs with the original function's name. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D140722	2023-04-19 19:05:21 +03:00
Johannes Doerfert	90609fb68f	[OpenMP][NFCI] Remove effectively dead code in clang and the runtime Differential Revision: https://reviews.llvm.org/D136903	2022-12-13 18:44:19 -08:00
Johannes Doerfert	f9c29878b0	Revert "[OpenMP][NFCI] Remove effectively dead code in clang and the runtime" This reverts commit c1c8cbbf5f29257d084a23a2f6c4236c40b7afb9. One of the tests seems to be flaky/non-deterministic.	2022-12-12 22:08:28 -08:00
Johannes Doerfert	c1c8cbbf5f	[OpenMP][NFCI] Remove effectively dead code in clang and the runtime	2022-12-12 20:55:36 -08:00
Kazu Hirata	bb666c6930	[CodeGen] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-03 11:13:43 -08:00
Jan Sjodin	9ea2b150b5	[OpenMP][OMPIRBuilder] Migrate createOffloadEntriesAndInfoMetadata from clang to OpenMPIRBuilder This patch moves the createOffloadEntriesAndInfoMetadata to OpenMPIRBuilder, the createOffloadEntry helper function. The clang specific error handling is invoked using a callback. This code will also be used by flang in the future.	2022-11-03 10:27:44 -04:00
Dhruva Chakrabarti	839ac62c50	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit 7539e9cf811e590d9f12ae39673ca789e26386b4.	2022-09-15 03:08:46 +00:00
Giorgis Georgakoudis	7539e9cf81	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6, ABataev Differential Revision: https://reviews.llvm.org/D102107	2022-09-15 00:54:05 +00:00
Joseph Huber	2d26ecb1fb	[OpenMP] Remove simplified device runtime handling The old device runtime had a "simplified" version that prevented many of the runtime features from being initialized. The old device runtime was deleted in LLVM 14 and is no longer in use. Selectively deactivating features is now done using specific flags rather than the old technique. This patch simply removes the extra logic required for handling the old simple runtime scheme. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D133802	2022-09-14 09:41:50 -05:00
Kazu Hirata	3f3930a451	Remove redundaunt virtual specifiers (NFC) Identified with tidy-modernize-use-override.	2022-07-25 23:00:59 -07:00
Joseph Huber	bc9c4d7216	[OpenMP][FIX] Pass the num_threads value directly to parallel_51 The problem with the old scheme is that we would need to keep track of the "next region" and reset the num_threads value after it. The new RT doesn't do it and an assertion is triggered. The old RT doesn't do it either, I haven't tested it but I assume a num_threads clause might impact multiple parallel regions "accidentally". Further, in SPMD mode num_threads was simply ignored, for some reason beyond me. In any case, parallel_51 is designed to take the clause value directly, so let's do that instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D113623	2021-12-09 16:30:29 -05:00
Atmn Patel	737c4a2673	[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421	2021-11-09 15:11:05 -05:00
Atmn Patel	ef717f3852	Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files" This reverts commit 81a7cad2ffc18f15b732f69d991c8398c979c5ca.	2021-11-09 02:10:42 -05:00
Atmn Patel	81a7cad2ff	[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421	2021-11-09 01:52:52 -05:00
Joseph Huber	bad44d5f39	[OpenMP] Add RTL function for getting number of threads in block. This patch adds support for the `__kmpc_get_hardware_num_threads_in_block` function that returns the number of threads. This was missing in the new runtime and was used by the AMDGPU plugin which prevented it from using the new runtime. This patchs also unified the interface for getting the thread numbers in the frontend. Originally authored by jdoerfert. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D111475	2021-10-08 22:21:59 -04:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Alexey Bataev	b3c80dd894	[OPENMP]Remove const firstprivate allocation as a variable in a constant space. Current implementation is not compatible with asynchronous target regions, need to remove it. Differential Revision: https://reviews.llvm.org/D105375	2021-07-07 05:56:48 -07:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Saiyedul Islam	160ff83765	[OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3 Provides AMDGCN and NVPTX specific specialization of getGPUWarpSize, getGPUThreadID, and getGPUNumThreads methods. Adds tests for AMDGCN codegen for these methods in generic and simd modes. Also changes the precondition in InitTempAlloca to be slightly more permissive. Useful for AMDGCN OpenMP codegen where allocas are created with a cast to an address space. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D84260	2020-08-03 05:38:39 +00:00
Saiyedul Islam	c7562e77b3	[OpenMP][NFC] Generalize CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU Refactors CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU to make it a generalization for OpenMP GPU Codegen. Target specific specialized methods for NVPTX are defined in class CGOpenMPRuntimeNVPTX. This paves the way for a clean and maintainable extension to more GPU targets for OpenMP Codegen. For original author (git blame) list of CGOpenMPRuntimeGPU code, look in history of CGOpenMPRuntimeNVPTX.cpp and .h, after this commit. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D83723	2020-07-17 14:38:04 +00:00

38 Commits