llvm-project

Author	SHA1	Message	Date
CHANDRA GHALE	76e6c8d3fc	Codegen changes for strict modifier with grainsize/num_tasks of taskloop construct (#117196 ) Initial parsing/sema for 'strict' modifier with 'num_tasks' and ‘grainsize’ clause is present in these commits [grainsize_parsing](`ab9eac762c`) and [num_tasks_parsing](`56c1660170 (diff-4184486638e85284c3a2c961a81e7752231022daf97e411007c13a6732b50db9R6545)`) . However, this implementation appears incomplete as it lacks code generation support. A runtime patch was introduced in this runtime commit [runtime_patch](`540007b427 (diff-5e95f9319910d6965d09c301359dbe6b23f3eef5ce4d262ef2c2d2137875b5c4R374)`) , which adds a new API, _kmpc_taskloop_5, to accommodate the strict modifier. In this patch I have added codegen support. When the strict modifier is present alongside the grainsize or num_tasks clauses of taskloop construct, the code now emits a call to _kmpc_taskloop_5, which includes an additional parameter of type i32 with the value 1 to indicate the strict modifier. If the strict modifier is not present, it falls back to the existing _kmpc_taskloop API call. --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2024-11-28 14:18:59 +05:30
Kazu Hirata	e8a6624325	[CodeGen] Remove unused includes (NFC) (#116459 ) Identified with misc-include-cleaner.	2024-11-16 07:37:13 -08:00
Sergio Afonso	d87964de78	[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533 ) This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected` objects to replace the values returned by callbacks passed to `OMPIRBuilder` codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace. This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.	2024-10-25 11:30:16 +01:00
Jay Foad	4dd55c567a	[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399 ) Follow up to #109133.	2024-10-24 10:23:40 +01:00
Kareem Ergawy	ad70f3e095	[flang][OpenMP] Support `target enter\|update\|exit .. nowait` (#113305 ) Extends `nowait` support for other device directives. This PR refactors the task generation utils used for the `target` directive so that they are general enough to be reused for other device directives as well.	2024-10-23 10:48:54 +02:00
Youngsuk Kim	6d4edf2f75	[clang][CGOpenMPRuntime] Avoid Type::getPointerTo() (NFC) (#112017 ) `llvm::Type::getPointerTo()` is to be deprecated & removed soon.	2024-10-11 20:09:24 -04:00
Congcong Cai	eca5949031	[codegen][NFC] add static mark for internal usage variable and function (#109431 ) Detect by clang-tidy misc-use-internal-linkage	2024-09-24 07:25:07 +08:00
Kazu Hirata	cc96ce831c	[CodeGen] Avoid repeated hash lookup (NFC) (#108735 )	2024-09-15 01:20:29 -07:00
JOE1994	1b913cde2a	[clang][CodeGen] Strip unneeded calls to raw_string_ostream::str() (NFC) Try to avoid excess layer of indirection when possible. p.s. Remove a call to raw_string_ostream::flush() which is a no-op.	2024-09-13 20:33:58 -04:00
Kazu Hirata	0593b95ff4	[CGOpenMPRuntime] Avoid repeated hash lookups (NFC) (#107358 )	2024-09-05 08:36:09 -07:00
Kazu Hirata	8133d47632	[CGOpenMPRuntime] Use DenseMap::operator[] (NFC) (#107185 ) I'm planning to deprecate DenseMap::FindAndConstruct in favor of DenseMap::operator[].	2024-09-04 01:35:47 -07:00
Kazu Hirata	f15e3e58c5	[CGOpenMPRuntime] Use DenseMap::operator[] (NFC) (#107158 ) I'm planning to deprecate DenseMap::FindAndConstruct in favor of DenseMap::operator[].	2024-09-03 20:02:29 -07:00
Shilei Tian	0551926fda	[Clang][OMPX] Add the code generation for multi-dim `thread_limit` clause (#102717 )	2024-08-16 13:59:46 -04:00
Shilei Tian	1c269929d0	[Clang][Sema][OpenMP] Allow `thread_limit` to accept multiple expressions (#102715 )	2024-08-10 09:54:58 -04:00
Shilei Tian	ee8100ba02	[Clang][OMPX] Add the code generation for multi-dim `num_teams` (#101407 ) This patch adds the code generation support for multi-dim `num_teams` clause when it is used with `target teams ompx_bare` construct.	2024-08-09 10:33:41 -04:00
Shilei Tian	cee594cf36	[Clang][Sema][OpenMP] Allow `num_teams` to accept multiple expressions (#99732 ) By the OpenMP standard, `num_teams` clause can only accept one expression (for now). In this patch, we extend it to allow to accept multiple expressions when it is used with `target teams ompx_bare` construct. This will allow to launch a multi-dim grid, same as CUDA/HIP.	2024-08-06 10:55:15 -04:00
jyu2-git	6848b99d17	[OpenMP][Map][NFC] improve map chain. (#101903 ) This is for mapping structure has data members, which have 'default' mappers, where needs to map these members individually using their 'default' mappers. example map(tofrom: spp[0][0]), look at test case. currently create 6 maps: 1>&spp, &spp[0], size 8, maptype TARGET_PARAM \| FROM \| TO 2>&spp[0], &spp[0][0], size(D)with maptype OMP_MAP_NONE, nullptr 3>&spp[0], &spp[0][0].e, size(e) with maptype MEMBER_OF \| FROM \| TO 4>&spp[0], &spp[0][0].h, size(h) with maptype MEMBER_OF \| FROM \| TO 5>&spp, &spp[0],size(8), maptype MEMBER_OF \| IMPLICIT \| FROM \| TO 6>&spp[0], &spp[0][0].f size(D) with maptype MEMBER_OF \|IMPLICIT \|PTR_AND_OBJ, @.omp_mapper._ZTS1C.default maptype with/without OMP_MAP_PTR_AND_OBJ For "2" and "5", since it is mapping pointer and pointee pair, PTR_AND_OBJ should be set But for "6" the PTR_AND_OBJ should not set. However, "5" is duplicate with "1" can be skip. To fix "2", during the call to emitCombinEntry with false with NotTargetParams instead !PartialStruct.PreliminaryMapData.BasePointers.empty(), since all captures need to be TARGET_PARAM And inside emitCombineEntry: check !PartialStruct.PreliminaryMapData.BasePointers.empty() to set PTR_AND_OBJ For "5" and "6": the fix in generateInfoForComponentList: Add new variable IsPartialMapped set with !PartialStruct.PreliminaryMapData.BasePointers.empty(); When that is true, skip generate "5" and don"t set IsExpressionFirstInfo to false, so that PTR_AND_OBJ would be set. After fix: will have 5 maps instead 6 1>&spp, &spp[0], size 8, maptype TARGET_PARAM \| FROM \| TO 2>&spp[0], &spp[0][0], size(D), maptype PTR_AND_OBJ, nullptr 3>&spp[0], &spp[0][0].e, size(e), maptype MEMBER_OF_2 \| FROM \| TO 4>&spp[0], &spp[0][0].h, size(h), maptype MEMBER_OF_2 \| FROM \| TO 5>&spp[0], &spp[0][0].f size(32), maptype MEMBER_OF_2 \| IMPLICIT, @.omp_mapper._ZTS1C.default For map(sppp[0][0][0]): after fix: will have 6 maps instead 8. https://github.com/llvm/llvm-project/pull/101903	2024-08-05 08:01:11 -07:00
Krzysztof Parzyszek	243b27f7e1	[clang][OpenMP] Rename `varlists` to `varlist`, NFC (#101058 ) It returns a range of variables (via Expr*), not a range of lists.	2024-07-30 08:11:09 -05:00
Pranav Bhandarkar	5b4e5f8ac6	[OpenMPIRBuilder][Clang][NFC] - Combine `emitOffloadingArrays` and `emitOffloadingArraysArgument` in OpenMPIRBuilder (#97088 ) This patch introduces a new interface in `OpenMPIRBuilder` that combines the creation of the so-called offloading pointer arrays and their subsequent preparation as arguments to the OpenMP runtime library. We then use this in Clang. This is intended to be used in the near future by other frontends such as Flang when lowering MLIR to LLVMIR.	2024-07-25 16:28:11 -05:00
Shivam Gupta	26b70707fc	[Clang] Remove some dead code in getNumTeamsExprForTargetDirective (#95695 ) This was reported in https://pvs-studio.com/en/blog/posts/cpp/1126/, fragment N9. V523 The 'then' statement is equivalent to the subsequent code fragment. CGOpenMPRuntime.cpp:6040, 6036 --------- Co-authored-by: Shivam Gupta <shivma98.tkg@gmail.com>	2024-07-25 12:22:40 +05:30
Joachim	4782a4ab0a	[OpenMP] Fix calculation of dependencies for multi-dimensional iteration space (#99347 ) The expectation for multiple iterators used in a single depend clause (`depend(iterator(i=0:5,j=0:5), in:x[i][j])`) is that the iterator space is the product of the iteration vectors (25 in that case). The current codeGen only works correctly, if `numIterators() = 1`. For more iterators, the execution results in runtime assertions or segfaults. The modified codeGen first calculates the iteration space, then multiplies to the number of dependencies in the depend clause and finally adds to the total number of iterator dependencies.	2024-07-18 07:41:41 +02:00
Michael Buch	4497ec293a	[clang][CGRecordLayout] Remove dependency on isZeroSize (#96422 ) This is a follow-up from the conversation starting at https://github.com/llvm/llvm-project/pull/93809#issuecomment-2173729801 The root problem that motivated the change are external AST sources that compute `ASTRecordLayout`s themselves instead of letting Clang compute them from the AST. One such example is LLDB using DWARF to get the definitive offsets and sizes of C++ structures. Such layouts should be considered correct (modulo buggy DWARF), but various assertions and lowering logic around the `CGRecordLayoutBuilder` relies on the AST having `[[no_unique_address]]` attached to them. This is a layout-altering attribute which is not encoded in DWARF. This causes us LLDB to trip over the various LLVM<->Clang layout consistency checks. There has been precedent for avoiding such layout-altering attributes from affecting lowering with externally-provided layouts (e.g., packed structs). This patch proposes to replace the `isZeroSize` checks in `CGRecordLayoutBuilder` (which roughly means "empty field with [[no_unique_address]]") with checks for `CodeGen::isEmptyField`/`CodeGen::isEmptyRecord`. Details The main strategy here was to change the `isZeroSize` check in `CGRecordLowering::accumulateFields` and `CGRecordLowering::accumulateBases` to use the `isEmptyXXX` APIs instead, preventing empty fields from being added to the `Members` and `Bases` structures. The rest of the changes fall out from here, to prevent lookups into these structures (for field numbers or base indices) from failing. Added `isEmptyRecordForLayout` and `isEmptyFieldForLayout` (open to better naming suggestions). The main difference to the existing `isEmptyRecord`/`isEmptyField` APIs, is that the `isEmptyXXXForLayout` counterparts don't have special treatment for `unnamed bitfields`/arrays and also treat fields of empty types as if they had `[[no_unique_address]]` (i.e., just like the `AsIfNoUniqueAddr` in `isEmptyField` does).	2024-07-16 04:59:51 +01:00
Sushant Gokhale	c7ee20433c	[OpenMP] Fix stack corruption due to argument mismatch (#96386 ) While lowering (#pragma omp target update from), clang's generated .omp_task_entry. is setting up 9 arguments while calling __tgt_target_data_update_nowait_mapper. At the same time, in __tgt_target_data_update_nowait_mapper, call to targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming it has the argument count listed in the signature. AARCH64 asm sequence for this is as follows (removed unrelated insns): ` .omp_task_entry..108: sub sp, sp, #32 stp x29, x30, sp, #16 // 16-byte Folded Spill add x29, sp, #16 str x8, sp, #8. // stack canary str xzr, [sp] bl __tgt_target_data_update_nowait_mapper __tgt_target_data_update_nowait_mapper: sub sp, sp, #32 stp x29, x30, sp, #16 // 16-byte Folded Spill add x29, sp, #16 str x8, sp, #8 // stack canary // Sibcall argument setup adrp x8, :got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb ldr x8, [x8, :got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb] stp x9, x8, x29, #16 adrp x8, .L.str.8 add x8, x8, :lo12:.L.str.8 str x8, x29, #32. <==. This is the insn that erases $fp ldp x29, x30, sp, #16 // 16-byte Folded Reload add sp, sp, #32 // Sibcall b ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD ` On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry. sets up only single space on stack and this results in ovewriting $fp and subsequent stack corruption. This issue can be credited to discrepancy of __tgt_target_data_update_nowait_mapper signature in openmp/libomptarget/include/omptarget.h taking 13 arguments while clang/lib/CodeGen/CGOpenMPRuntime.cpp and llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments. This patch modifies __tgt_target_data_update_nowait_mapper signature to match .omp_task_entry usage(and other 2 files mentioned above). Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>	2024-07-05 10:39:15 +05:30
jyu2-git	32f7672acc	[Clang][OpenMP] This is addition fix for #92210 . (#94802 ) Fix another runtime problem when explicit map both pointer and pointee in target data region. In #92210, problem is only addressed in target region, but missing for target data region. The change just passing AreBothBasePtrAndPteeMapped in generateInfoForComponentList when processing target data. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-07-03 20:56:53 -07:00
Gheorghe-Teodor Bercea	1a478a69bc	[OpenMP][offload] Fix dynamic schedule tracking (#97065 ) This patch fixes the dynamic schedule tracking.	2024-07-01 10:23:11 -04:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Ahmed Bougacha	3575d23ca8	[clang][CodeGen] Remove unused LValue::getAddress CGF arg. (#92465 ) This is in effect a revert of f139ae3d93797, as we have since gained a more sophisticated way of doing extra IRGen with the addition of RawAddress in #86923.	2024-05-20 10:23:04 -07:00
jyu2-git	8e00703be9	[Clang][OpenMP] Fix runtime problem when explicit map both pointer and pointee (#92210 ) ponter int p for following map, test currently crash. map(p, p[:100]) or map(p, p[1]) Currly IR looks like // &p, &p, sizeof(int), TARGET_PARAM \| TO \| FROM // &p, p[0], 100sizeof(float) TO \| FROM Worrking IR is // map(p, p[0:100]) to map(p[0:100]) // &p, &p[0], 100sizeof(float), TARGET_PARAM \| TO \| FROM \| PTR_AND_OBJ The change is add new argument AreBothBasePtrAndPteeMapped in generateInfoForComponentList Use that to skip map for map(p), when processing map(p[:100]) generate map with right flag.	2024-05-15 08:20:25 -07:00
Erich Keane	39adc8f423	[NFC] Generalize ArraySections to work for OpenACC in the future (#89639 ) OpenACC is going to need an array sections implementation that is a simpler version/more restrictive version of the OpenMP version. This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`, then adds an enum to choose between the two. This also fixes a couple of 'drive-by' issues that I discovered on the way, but leaves the OpenACC Sema parts reasonably unimplemented (no semantic analysis implementation), as that will be a followup patch.	2024-04-25 10:22:03 -07:00
David Pagan	a12836647e	[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278 ) IR for 'target teams loop' is now dependent on suitability of associated loop-nest. If a loop-nest: - does not contain a function call, or - the -fopenmp-assume-no-nested-parallelism has been specified, - or the call is to an OpenMP API AND - does not contain nested loop bind(parallel) directives then it can be emitted as 'target teams distribute parallel for', which is the current default. Otherwise, it is emitted as 'target teams distribute'. Added debug output indicating how 'target teams loop' was emitted. Flag is -mllvm -debug-only=target-teams-loop-codegen Added LIT tests explicitly verifying 'target teams loop' emitted as a parallel loop and a distribute loop. Updated other 'loop' related tests as needed to reflect change in IR. - These updates account for most of the changed files and additions/deletions.	2024-04-10 13:09:17 -07:00
Simon Pilgrim	110e933b7a	CGOpenMPRuntime.cpp - fix Wparentheses warning. NFC.	2024-04-04 14:59:00 +01:00
Akira Hatanaka	84780af4b0	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86923 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was reverted because it broke ubsan bots. There seems to be a bug in coroutine code-gen, which is causing EmitTypeCheck to use the wrong alignment. For now, pass alignment zero to EmitTypeCheck so that it can compute the correct alignment based on the passed type (see function EmitCXXMemberOrOperatorMemberCallExpr).	2024-03-28 06:54:36 -07:00
Akira Hatanaka	f75eebab88	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 )" (#86898 ) This reverts commit d9a685a9dd589486e882b722e513ee7b8c84870c. The commit broke ubsan bots.	2024-03-27 18:14:04 -07:00
Akira Hatanaka	d9a685a9dd	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit broke msan bots because LValue::IsKnownNonNull was uninitialized.	2024-03-27 12:24:49 -07:00
Chris B	28ddbd4a86	[NFC] Refactor ConstantArrayType size storage (#85716 ) In PR #79382, I need to add a new type that derives from ConstantArrayType. This means that ConstantArrayType can no longer use `llvm::TrailingObjects` to store the trailing optional Expr*. This change refactors ConstantArrayType to store a 60-bit integer and 4-bits for the integer size in bytes. This replaces the APInt field previously in the type but preserves enough information to recreate it where needed. To reduce the number of places where the APInt is re-constructed I've also added some helper methods to the ConstantArrayType to allow some common use cases that operate on either the stored small integer or the APInt as appropriate. Resolves #85124.	2024-03-26 14:15:56 -05:00
Akira Hatanaka	b311756450	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 )" (#86674 ) This reverts commit 8bd1f9116aab879183f34707e6d21c7051d083b6. It appears that the commit broke msan bots.	2024-03-26 07:37:57 -07:00
Akira Hatanaka	8bd1f9116a	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects.	2024-03-25 18:05:42 -07:00
mikaoP	4e3310a813	[clang] Fix OMPT ident flag in combined distribute parallel for pragma (#80987 ) Authored-by: Raúl Peñacoba Veigas <rpenacob@bsc.es>	2024-03-12 07:50:35 -04:00
Joseph Huber	cc374d8056	[OpenMP] Remove `register_requires` global constructor (#80460 ) Summary: Currently, OpenMP handles the `omp requires` clause by emitting a global constructor into the runtime for every translation unit that requires it. However, this is not a great solution because it prevents us from having a defined order in which the runtime is accessed and used. This patch changes the approach to no longer use global constructors, but to instead group the flag with the other offloading entires that we already handle. This has the effect of still registering each flag per requires TU, but now we have a single constructor that handles everything. This function removes support for the old `__tgt_register_requires` and replaces it with a warning message. We just had a recent release, and the OpenMP policy for the past four releases since we switched to LLVM is that we do not provide strict backwards compatibility between major LLVM releases now that the library is versioned. This means that a user will need to recompile if they have an old binary that relied on `register_requires` having the old behavior. It is important that we actively deprecate this, as otherwise it would not solve the problem of having no defined init and shutdown order for `libomptarget`. The problem of `libomptarget` not having a define init and shutdown order cascades into a lot of other issues so I have a strong incentive to be rid of it. It is worth noting that the current `__tgt_offload_entry` only has space for a 32-bit integer here. I am planning to overhaul these at some point as well.	2024-02-21 11:33:32 -06:00
Jan Patrick Lehr	fa4780fa6c	[OpenMP][USM] Introduces -fopenmp-force-usm flag (#76571 ) This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Which I think is the only difference observable in code generation. This runtime test checks for the (non-)occurence of data movement between host and device. It does one run without the flag and one with the flag to also see that both versions behave as expected. In the case w/o the new flag data movement between host and device is expected. In the case with the flag such data movement should not be present / reported.	2024-01-22 21:59:26 +01:00
Gheorghe-Teodor Bercea	4ef6587715	[Clang][OpenMP] Fix mapping of structs to device (#75642 ) Fix mapping of structs to device. The following example fails: ``` #include <stdio.h> #include <stdlib.h> struct Descriptor { int datum; long int x; int xi; long int arr[1][30]; }; int main() { Descriptor dat = Descriptor(); dat.datum = (int )malloc(sizeof(int)*10); dat.xi = 3; dat.arr[0][0] = 1; #pragma omp target enter data map(to: dat.datum[:10]) map(to: dat) #pragma omp target { dat.xi = 4; dat.datum[dat.arr[0][0]] = dat.xi; } #pragma omp target exit data map(from: dat) return 0; } ``` This is a rework of the previous attempt: https://github.com/llvm/llvm-project/pull/72410	2023-12-18 09:47:59 -05:00
jyu2-git	0113722d82	[OpenMP] Fix runtime problem due to wrong map size. (#74692 ) Currently we are missing set up-boundary address for FinalArraySection as highests elements in partial struct data. Currently for: \#pragma omp target map(D.a) map(D.b[:2]) The size is: %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0 %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0 %2 = getelementptr float, ptr %arrayidx, i32 1 %3 = ptrtoint ptr %2 to i64 %4 = ptrtoint ptr %a to i64 %5 = sub i64 %3, %4 %6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64) Where %2 is wrong for (D.b[:2]) is pointer to first element of array section. It should pointe to last element of array section. The fix is to emit the pointer to the last element of array section and use this pointer as the highest element in partial struct data. After change IR: %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0 %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0 %b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1 %1 = getelementptr float, ptr %arrayidx2, i32 1 %2 = ptrtoint ptr %1 to i64 %3 = ptrtoint ptr %a to i64 %4 = sub i64 %2, %3 %5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)	2023-12-07 09:38:56 -08:00
Ivan R. Ivanov	065796bb92	[clang][OpenMP] Fix missing DI for __kmpc_global_thread_num (#73856 ) Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>	2023-11-30 13:21:03 -06:00
Joseph Huber	237adfca4e	[OpenMP] Rework handling of global ctor/dtors in OpenMP (#71739 ) Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549	2023-11-10 14:53:53 -06:00
Vlad Serebrennikov	dda8e3de35	[clang][NFC] Refactor `ImplicitParamDecl::ImplicitParamKind` This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.	2023-11-06 12:01:09 +03:00
Vlad Serebrennikov	edd690b02e	[clang][NFC] Refactor `TagTypeKind` (#71160 ) This patch converts TagTypeKind into scoped enum. Among other benefits, this allows us to forward-declare it where necessary.	2023-11-03 21:45:39 +04:00
Vlad Serebrennikov	50dec541f3	[clang][NFC] Refactor `OMPDeclareReductionDecl::InitKind` This patch moves `OMPDeclareReductionDecl::InitKind` to DeclBase.h, so that it's complete at the point where corresponding bit-field is declared. This patch also converts it to scoped enum named `OMPDeclareReductionInitKind`	2023-11-01 12:40:13 +03:00
Vlad Serebrennikov	49fd28d960	[clang][NFC] Refactor `ArrayType::ArraySizeModifier` This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.	2023-10-31 18:06:34 +03:00
Youngsuk Kim	c1183399a8	[clang] Remove no-op ptr-to-ptr bitcasts (NFC)	2023-10-30 16:51:55 -05:00

1 2 3 4 5 ...

850 Commits