llvm-project

Author	SHA1	Message	Date
theRonShark	9b61ff210f	Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309 ) Reverts llvm/llvm-project#185989	2026-03-13 05:20:40 +00:00
Joseph Huber	4376fbd793	[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989 ) Summary: We use this `dyn_ptr` argument in Clang/OpenMP to handle the `KernelLaunchEnvironment`. This is a per-kernel argument used to share some information. Currenetly, it's prepended to the argument list and we generate storage for it in the runtime. This is bad for a few reasons: 1. It changes the ABI by shifting user arguments 2. It cannot be trivially be left uninitialized if unused 3. The runtime must allocate its own memory for it This PR changes it to be appended instead. Additionally, space for this is always emitted. This means the OMPIRBuilder itself will provide the storage, we simply need to populate it in the runtime if it is used. This means that if it's unused we don't always pay the cost and it's easier for non-OpenMP users to ignore it. Backward compatibility is maintained by auto-upgrading the kernel arguments. In `libomptarget` we completely allocate a new buffer to store this in the new format. The plugins still need to respect the old ABI of the called device object, so we simply rotate it if it's the old version.	2026-03-12 18:08:22 -05:00
Kevin Sala Penades	0e92beb0c0	[Clang][OpenMP] Switch to __kmpc_parallel_60 with strict parameter (#171082 ) This commit switches the `__kmpc_parallel_51` to `__kmpc_parallel_60`, and adds the strict boolean for the number of threads.	2025-12-08 09:37:11 -08:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Shilei Tian	10068cd654	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-26 13:35:14 -04:00
Shilei Tian	6bd74fd65f	Revert commits for kernel environment This reverts commits for kernel environments as they causes issues in AMD BB.	2023-07-23 23:32:31 -04:00
Shilei Tian	c5c8040390	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-23 18:36:01 -04:00
Sergio Afonso	63ca93c7d1	[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes `IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to `-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed to `omp.is_target_device`. Getters and setters of all these renamed properties are also updated accordingly. Many unit tests have been updated to use the new names, but an alias for the `-fopenmp-is-device` option is created so that external programs do not stop working after the name change. `IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the `-fopenmp-is-target-device` compiler frontend option, which is only added to the OpenMP device invocation for offloading-enabled programs. Differential Revision: https://reviews.llvm.org/D154591	2023-07-10 14:14:16 +01:00
Shilei Tian	d4ecd1241c	Revert "[OpenMP] Introduce kernel environment" This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9. It makes a couple of buildbots unhappy because of the following test failures: - `Transforms/OpenMP/add_attributes.ll'` - `mapping/declare_mapper_target_data.cpp` on AMDGPU	2023-04-22 20:56:35 -04:00
Shilei Tian	35cfadfbe2	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-04-22 20:46:38 -04:00
Itay Bookstein	782c59a4ee	[OpenMP] Prefix outlined and reduction func names with original func's name This patch prefixes omp outlined helpers and reduction funcs with the original function's name. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D140722	2023-04-19 23:00:26 +03:00
Itay Bookstein	6fdd13e0ec	Revert "[OpenMP] Prefix outlined and reduction func names with original func's name" This reverts commit 029bfc311d4d7d3cd90be81bb08c046848796d02.	2023-04-19 19:08:49 +03:00
Itay Bookstein	029bfc311d	[OpenMP] Prefix outlined and reduction func names with original func's name This patch attempts to prefix omp outlined helpers and reduction funcs with the original function's name. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D140722	2023-04-19 19:05:21 +03:00
Dhruva Chakrabarti	1c9ec74e3f	[Clang][OpenMP] Insert alloca for kernel args at function entry block instead of the launch point. If an inlined kernel is called in a loop, the launch point alloca would lead to increasing stack usage every time the kernel is invoked. This could make the application run out of stack space and crash. This problem is fixed by using the alloca insertion point while creating the alloca instruction. Fixes https://github.com/llvm/llvm-project/issues/60602 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D145820	2023-03-17 16:36:12 -04:00
Shilei Tian	270f533e8b	[Clang][OpenMP] Update tests using update_cc_test_checks.py Make preparation for other patches Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D144320	2023-02-21 21:09:28 -05:00
Shilei Tian	2ce96784ba	Revert "[Clang][OpenMP] Update tests using update_cc_test_checks.py" This reverts commit 61faf261506f93819e48f050f318f96452831f92 as it makes buildbot unhappy.	2023-02-21 20:28:00 -05:00
Shilei Tian	61faf26150	[Clang][OpenMP] Update tests using update_cc_test_checks.py Make preparation for other patches Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D144320	2023-02-21 20:09:17 -05:00
Joseph Huber	a8ec170e01	[OpenMP] Make the exec_mode global have protected visibility We use protected visibility for almost everything with offloading. This is because it provides us with the ability to read things from the host without the expectation that it will be preempted by a shared library load, bugs related to this have happened when offloading to the host. This patch just makes the `exec_mode` global generated for each plugin have protected visibility. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D135285	2022-10-05 14:39:22 -05:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Pushpinder Singh	3a12ff0dac	[OpenMP][RTL] Remove dead code RequiresDataSharing was always 0, resulting dead code in device runtime library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D88829	2020-10-06 05:43:47 -04:00
Saiyedul Islam	a1bdf8f545	[OpenMP] Ensure testing for versions 4.5 and default - Part 2 Many OpenMP Clang tests do not RUN for version 4.5 and the default version. This second patch in the series handles test cases which require updation in CHECK lines along with adding RUN lines for the default version. It involves updating line number of pragmas. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D85150	2020-08-27 18:50:52 +00:00
Gheorghe-Teodor Bercea	2b40470c61	[OpenMP] Add a new version of the SPMD deinit kernel function Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D54970 llvm-svn: 347915	2018-11-29 20:53:49 +00:00
Alexey Bataev	8bcc69c054	[OPENMP][NVPTX]Extend number of constructs executed in SPMD mode. If the statements between target\|teams\|distribute directives does not require execution in master thread, like constant expressions, null statements, simple declarations, etc., such construct can be xecuted in SPMD mode. llvm-svn: 346551	2018-11-09 20:03:19 +00:00
Alexey Bataev	8d8e1235ab	[OPENMP][NVPTX] Add support for lightweight runtime. If the target construct can be executed in SPMD mode + it is a loop based directive with static scheduling, we can use lightweight runtime support. llvm-svn: 340953	2018-08-29 18:32:21 +00:00
Gheorghe-Teodor Bercea	ad4e579407	[OpenMP] Initialize data sharing stack for SPMD case Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit. Reviewers: ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D49188 llvm-svn: 337015	2018-07-13 16:18:24 +00:00
Carlo Bertolli	79712097c7	[OpenMP] Extend NVPTX SPMD implementation of combined constructs Differential Revision: https://reviews.llvm.org/D43852 This patch extends the SPMD implementation to all target constructs and guards this implementation under a new flag. llvm-svn: 326368	2018-02-28 20:48:35 +00:00
Arpith Chacko Jacob	2cd6eeabfd	[OpenMP] Support for the proc_bind-clause on 'target parallel' on the NVPTX device. This patch adds support for the proc_bind clause on the Spmd construct 'target parallel' on the NVPTX device. Since the parallel region is created upon kernel launch, this clause can be safely ignored on the NVPTX device at codegen time for level 0 parallelism. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29128 llvm-svn: 293069	2017-01-25 16:55:10 +00:00

29 Commits