llvm-project

Author	SHA1	Message	Date
jyu2-git	32f7672acc	[Clang][OpenMP] This is addition fix for #92210 . (#94802 ) Fix another runtime problem when explicit map both pointer and pointee in target data region. In #92210, problem is only addressed in target region, but missing for target data region. The change just passing AreBothBasePtrAndPteeMapped in generateInfoForComponentList when processing target data. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-07-03 20:56:53 -07:00
Gheorghe-Teodor Bercea	1a478a69bc	[OpenMP][offload] Fix dynamic schedule tracking (#97065 ) This patch fixes the dynamic schedule tracking.	2024-07-01 10:23:11 -04:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Ahmed Bougacha	3575d23ca8	[clang][CodeGen] Remove unused LValue::getAddress CGF arg. (#92465 ) This is in effect a revert of f139ae3d93797, as we have since gained a more sophisticated way of doing extra IRGen with the addition of RawAddress in #86923.	2024-05-20 10:23:04 -07:00
jyu2-git	8e00703be9	[Clang][OpenMP] Fix runtime problem when explicit map both pointer and pointee (#92210 ) ponter int p for following map, test currently crash. map(p, p[:100]) or map(p, p[1]) Currly IR looks like // &p, &p, sizeof(int), TARGET_PARAM \| TO \| FROM // &p, p[0], 100sizeof(float) TO \| FROM Worrking IR is // map(p, p[0:100]) to map(p[0:100]) // &p, &p[0], 100sizeof(float), TARGET_PARAM \| TO \| FROM \| PTR_AND_OBJ The change is add new argument AreBothBasePtrAndPteeMapped in generateInfoForComponentList Use that to skip map for map(p), when processing map(p[:100]) generate map with right flag.	2024-05-15 08:20:25 -07:00
Erich Keane	39adc8f423	[NFC] Generalize ArraySections to work for OpenACC in the future (#89639 ) OpenACC is going to need an array sections implementation that is a simpler version/more restrictive version of the OpenMP version. This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`, then adds an enum to choose between the two. This also fixes a couple of 'drive-by' issues that I discovered on the way, but leaves the OpenACC Sema parts reasonably unimplemented (no semantic analysis implementation), as that will be a followup patch.	2024-04-25 10:22:03 -07:00
David Pagan	a12836647e	[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278 ) IR for 'target teams loop' is now dependent on suitability of associated loop-nest. If a loop-nest: - does not contain a function call, or - the -fopenmp-assume-no-nested-parallelism has been specified, - or the call is to an OpenMP API AND - does not contain nested loop bind(parallel) directives then it can be emitted as 'target teams distribute parallel for', which is the current default. Otherwise, it is emitted as 'target teams distribute'. Added debug output indicating how 'target teams loop' was emitted. Flag is -mllvm -debug-only=target-teams-loop-codegen Added LIT tests explicitly verifying 'target teams loop' emitted as a parallel loop and a distribute loop. Updated other 'loop' related tests as needed to reflect change in IR. - These updates account for most of the changed files and additions/deletions.	2024-04-10 13:09:17 -07:00
Simon Pilgrim	110e933b7a	CGOpenMPRuntime.cpp - fix Wparentheses warning. NFC.	2024-04-04 14:59:00 +01:00
Akira Hatanaka	84780af4b0	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86923 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was reverted because it broke ubsan bots. There seems to be a bug in coroutine code-gen, which is causing EmitTypeCheck to use the wrong alignment. For now, pass alignment zero to EmitTypeCheck so that it can compute the correct alignment based on the passed type (see function EmitCXXMemberOrOperatorMemberCallExpr).	2024-03-28 06:54:36 -07:00
Akira Hatanaka	f75eebab88	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 )" (#86898 ) This reverts commit d9a685a9dd589486e882b722e513ee7b8c84870c. The commit broke ubsan bots.	2024-03-27 18:14:04 -07:00
Akira Hatanaka	d9a685a9dd	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit broke msan bots because LValue::IsKnownNonNull was uninitialized.	2024-03-27 12:24:49 -07:00
Chris B	28ddbd4a86	[NFC] Refactor ConstantArrayType size storage (#85716 ) In PR #79382, I need to add a new type that derives from ConstantArrayType. This means that ConstantArrayType can no longer use `llvm::TrailingObjects` to store the trailing optional Expr*. This change refactors ConstantArrayType to store a 60-bit integer and 4-bits for the integer size in bytes. This replaces the APInt field previously in the type but preserves enough information to recreate it where needed. To reduce the number of places where the APInt is re-constructed I've also added some helper methods to the ConstantArrayType to allow some common use cases that operate on either the stored small integer or the APInt as appropriate. Resolves #85124.	2024-03-26 14:15:56 -05:00
Akira Hatanaka	b311756450	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 )" (#86674 ) This reverts commit 8bd1f9116aab879183f34707e6d21c7051d083b6. It appears that the commit broke msan bots.	2024-03-26 07:37:57 -07:00
Akira Hatanaka	8bd1f9116a	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects.	2024-03-25 18:05:42 -07:00
mikaoP	4e3310a813	[clang] Fix OMPT ident flag in combined distribute parallel for pragma (#80987 ) Authored-by: Raúl Peñacoba Veigas <rpenacob@bsc.es>	2024-03-12 07:50:35 -04:00
Joseph Huber	cc374d8056	[OpenMP] Remove `register_requires` global constructor (#80460 ) Summary: Currently, OpenMP handles the `omp requires` clause by emitting a global constructor into the runtime for every translation unit that requires it. However, this is not a great solution because it prevents us from having a defined order in which the runtime is accessed and used. This patch changes the approach to no longer use global constructors, but to instead group the flag with the other offloading entires that we already handle. This has the effect of still registering each flag per requires TU, but now we have a single constructor that handles everything. This function removes support for the old `__tgt_register_requires` and replaces it with a warning message. We just had a recent release, and the OpenMP policy for the past four releases since we switched to LLVM is that we do not provide strict backwards compatibility between major LLVM releases now that the library is versioned. This means that a user will need to recompile if they have an old binary that relied on `register_requires` having the old behavior. It is important that we actively deprecate this, as otherwise it would not solve the problem of having no defined init and shutdown order for `libomptarget`. The problem of `libomptarget` not having a define init and shutdown order cascades into a lot of other issues so I have a strong incentive to be rid of it. It is worth noting that the current `__tgt_offload_entry` only has space for a 32-bit integer here. I am planning to overhaul these at some point as well.	2024-02-21 11:33:32 -06:00
Jan Patrick Lehr	fa4780fa6c	[OpenMP][USM] Introduces -fopenmp-force-usm flag (#76571 ) This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Which I think is the only difference observable in code generation. This runtime test checks for the (non-)occurence of data movement between host and device. It does one run without the flag and one with the flag to also see that both versions behave as expected. In the case w/o the new flag data movement between host and device is expected. In the case with the flag such data movement should not be present / reported.	2024-01-22 21:59:26 +01:00
Gheorghe-Teodor Bercea	4ef6587715	[Clang][OpenMP] Fix mapping of structs to device (#75642 ) Fix mapping of structs to device. The following example fails: ``` #include <stdio.h> #include <stdlib.h> struct Descriptor { int datum; long int x; int xi; long int arr[1][30]; }; int main() { Descriptor dat = Descriptor(); dat.datum = (int )malloc(sizeof(int)*10); dat.xi = 3; dat.arr[0][0] = 1; #pragma omp target enter data map(to: dat.datum[:10]) map(to: dat) #pragma omp target { dat.xi = 4; dat.datum[dat.arr[0][0]] = dat.xi; } #pragma omp target exit data map(from: dat) return 0; } ``` This is a rework of the previous attempt: https://github.com/llvm/llvm-project/pull/72410	2023-12-18 09:47:59 -05:00
jyu2-git	0113722d82	[OpenMP] Fix runtime problem due to wrong map size. (#74692 ) Currently we are missing set up-boundary address for FinalArraySection as highests elements in partial struct data. Currently for: \#pragma omp target map(D.a) map(D.b[:2]) The size is: %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0 %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0 %2 = getelementptr float, ptr %arrayidx, i32 1 %3 = ptrtoint ptr %2 to i64 %4 = ptrtoint ptr %a to i64 %5 = sub i64 %3, %4 %6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64) Where %2 is wrong for (D.b[:2]) is pointer to first element of array section. It should pointe to last element of array section. The fix is to emit the pointer to the last element of array section and use this pointer as the highest element in partial struct data. After change IR: %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0 %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0 %b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1 %arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1 %1 = getelementptr float, ptr %arrayidx2, i32 1 %2 = ptrtoint ptr %1 to i64 %3 = ptrtoint ptr %a to i64 %4 = sub i64 %2, %3 %5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32 1) to i64)	2023-12-07 09:38:56 -08:00
Ivan R. Ivanov	065796bb92	[clang][OpenMP] Fix missing DI for __kmpc_global_thread_num (#73856 ) Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>	2023-11-30 13:21:03 -06:00
Joseph Huber	237adfca4e	[OpenMP] Rework handling of global ctor/dtors in OpenMP (#71739 ) Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549	2023-11-10 14:53:53 -06:00
Vlad Serebrennikov	dda8e3de35	[clang][NFC] Refactor `ImplicitParamDecl::ImplicitParamKind` This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.	2023-11-06 12:01:09 +03:00
Vlad Serebrennikov	edd690b02e	[clang][NFC] Refactor `TagTypeKind` (#71160 ) This patch converts TagTypeKind into scoped enum. Among other benefits, this allows us to forward-declare it where necessary.	2023-11-03 21:45:39 +04:00
Vlad Serebrennikov	50dec541f3	[clang][NFC] Refactor `OMPDeclareReductionDecl::InitKind` This patch moves `OMPDeclareReductionDecl::InitKind` to DeclBase.h, so that it's complete at the point where corresponding bit-field is declared. This patch also converts it to scoped enum named `OMPDeclareReductionInitKind`	2023-11-01 12:40:13 +03:00
Vlad Serebrennikov	49fd28d960	[clang][NFC] Refactor `ArrayType::ArraySizeModifier` This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.	2023-10-31 18:06:34 +03:00
Youngsuk Kim	c1183399a8	[clang] Remove no-op ptr-to-ptr bitcasts (NFC)	2023-10-30 16:51:55 -05:00
Johannes Doerfert	31b91213bd	[OpenMP] Unify the min/max thread/teams pathways We used to pass the min/max threads/teams values through different paths from the frontend to the middle end. This simplifies the situation by passing the values once, only when we will create the KernelEnvironment, which contains the values. At that point we also manifest the metadata, as appropriate. Some footguns have also been removed, e.g., our target check is now triple-based, not calling convention-based, as the latter is dependent on the ordering of operations. The types of the values have been unified to int32_t.	2023-10-29 10:53:20 -07:00
Johannes Doerfert	0ba57c8bba	[OpenMP] Pass min/max thread and team count to the OMPIRBuilder (#70247 ) We now provide the information about the min/max thread and team count from to the OMPIRBuilder, no matter what the source was. That means we unify `thread_limit`, `num_teams`, `num_threads` handling with the target specific attriutes (`__launch_bounds__` and `amdgpu_flat_work_group_size`). This is in preparation to pass the values to the runtime, and to allow the middle-end (OpenMP-opt) to tighten the values if it seems appropriate. There is no "real" change after this commit.	2023-10-26 14:45:07 -07:00
David Pagan	52315f9b75	[clang][OpenMP] Fix target data if/logical expression assert fail (#70268 ) Fixed assertion failure Basic Block in function 'main' does not have terminator! label %land.end caused by premature setting of CodeGenIP upon entry to emitTargetDataCalls, where subsequent evaluation of logical expression created new basic blocks, leaving CodeGenIP pointing to the wrong basic block. CodeGenIP is now set near the end of the function, just prior to generating a comparison of the logical expression result (from the if clause) which uses CodeGenIP to insert new IR.	2023-10-26 13:19:37 -07:00
agozillon	bc0c1783fd	[Clang][OpenMP][OMPIRBuilder] Move Clang's OpenMP Member/MemberOf flag helpers into the OMPIRBuilder (#67844 ) This patch seeks to move the following functions to the OMPIRBuilder: - getFlagMemberOffset - getMemberOfFlag - setCorrectMemberOfFlag These small helper functions help set the end bits of the OpenMPOffloadMappingFlags flag that correspond to the reserved segment for OMP_MAP_MEMBER_OF. They will be of use in the future for lowering MLIR types/values that can contian members and can be lowered similarly to a structure or class type within the OpenMPToLLVMIRTranslation step of the OpenMP dialects lowering to LLVM-IR. In particular for Flang which currently uses this flow. Types with descriptors like pointers/allocatables, and likely derived types in certain cases can be lowered as if they were structures with explicitly mapped members.	2023-10-03 15:20:44 +02:00
Matheus Izvekov	9c89b29555	-fsanitize=function: fix MSVC hashing to sugared type (#66816 ) Hashing the sugared type instead of the canonical type meant that a simple example like this would always fail under MSVC: ``` static auto l() {} int main() { auto a = l; a(); } ``` `clang --target=x86_64-pc-windows-msvc -fno-exceptions -fsanitize=function -g -O0 -fuse-ld=lld -o test.exe test.cc` produces: ``` test.cc:4:3: runtime error: call to function l through pointer to incorrect function type 'void (*)()' ```	2023-10-02 19:09:39 +02:00
Corentin Jabot	af4751738d	[C++] Implement "Deducing this" (P0847R7) This patch implements P0847R7 (partially), CWG2561 and CWG2653. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D140828	2023-10-02 14:33:02 +02:00
Jan Svoboda	8a2fb1391b	[clang] NFCI: Use `FileEntryRef` in `SourceManager::FileInfos` (#67742 )	2023-09-29 08:04:34 -07:00
Sergio Afonso	9058762789	[OpenMP][Flang][MLIR] Lowering of requires directive from MLIR to LLVM IR Default atomic ordering information is processed in the OpenMP dialect to LLVM IR lowering stage at every spot where an operation can be affected by it. The rest of clauses are stored globally in the OpenMPIRBuilderConfig object before starting that lowering stage, so that the OMPIRBuilder can conditionally modify code generation depending on these. At the end of the process, the omp.requires attribute is itself lowered into a global constructor that passes these clauses as flags to the OpenMP runtime. Depends on D147217, D147218 and D158278. Differential Revision: https://reviews.llvm.org/D147219	2023-09-14 10:35:44 +01:00
Sergio Afonso	094a63a20b	[OpenMP][OMPIRBuilder] OpenMPIRBuilder support for requires directive This patch updates the `OpenMPIRBuilderConfig` structure to hold all available 'requires' clauses, and it replicates part of the code generation for the 'requires' registration function from clang in the `OMPIRBuilder`, to be used with flang. Porting the rest of features of the clang implementation to the IRBuilder and sharing it between clang and flang remains for a future patch, due to the complexity of the logic selecting the attributes of the generated registration function. Differential Revision: https://reviews.llvm.org/D147217	2023-09-14 10:33:54 +01:00
Chris Bieneman	400d3261a0	[HLSL] Cleanup support for `this` as an l-value The goal of this change is to clean up some of the code surrounding HLSL using CXXThisExpr as a non-pointer l-value. This change cleans up a bunch of assumptions and inconsistencies around how the type of `this` is handled through the AST and code generation. This change is be mostly NFC for HLSL, and completely NFC for other language modes. This change introduces a new member to query for the this object's type and seeks to clarify the normal usages of the this type. With the introudction of HLSL to clang, CXXThisExpr may now be an l-value and behave like a reference type rather than C++'s normal method of it being an r-value of pointer type. With this change there are now three ways in which a caller might need to query the type of `this`: * The type of the `CXXThisExpr` * The type of the object `this` referrs to * The type of the implicit (or explicit) `this` argument This change codifies those three ways you may need to query respectively as: * CXXMethodDecl::getThisType() * CXXMethodDecl::getThisObjectType() * CXXMethodDecl::getThisArgType() This change then revisits all uses of `getThisType()`, and in cases where the only use was to resolve the pointee type, it replaces the call with `getThisObjectType()`. In other cases it evaluates whether the desired returned type is the type of the `this` expr, or the type of the `this` function argument. The `this` expr type is used for creating additional expr AST nodes and for member lookup, while the argument type is used mostly for code generation. Additionally some cases that used `getThisType` in simple queries could be substituted for `getThisObjectType`. Since `getThisType` is implemented in terms of `getThisObjectType` calling the later should be more efficient if the former isn't needed. Reviewed By: aaron.ballman, bogner Differential Revision: https://reviews.llvm.org/D159247	2023-09-05 19:38:50 -05:00
Takuya Shimizu	01b88dd66d	[NFC] Remove unused variables declared in conditions D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch. Differential Revision: https://reviews.llvm.org/D158016	2023-08-30 10:05:06 +09:00
Kazu Hirata	24fcc0a7c5	[clang] Remove redundant control flow statements (NFC)	2023-08-27 12:13:25 -07:00
Sandeep Kosuri	08bbff4aad	[OpenMP] Codegen support for thread_limit on target directive for host offloading - This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5] - The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region. Differential Revision: https://reviews.llvm.org/D152054	2023-08-26 22:18:49 -05:00
Joseph Huber	9da61aed75	[OpenMP] Emit offloading entries for indirect target variables OpenMP 5.1 allows emission of the `indirect` clause on declare target functions, see https://www.openmp.org/spec-html/5.1/openmpsu70.html#x98-1080002.14.7. The intended use of this is to permit calling device functions via their associated host pointer. In order to do this the first step will be building a map associating these variables. Doing this will require the same offloading entry handling we use for other kernels and globals. We intentionally emit a new global on the device side. Although it's possible to look up the device function's address directly, this would require changing the visibility and would prevent us from making static functions indirect. Also, the CUDA toolchain will optimize out unused functions and using a global prevents that. The downside is that the runtime will need to read the global and copy its value, but there shouldn't be any other costs. Note that this patch just performs the codegen, currently this new offloading entry type is unused and will be ignored by the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D157738	2023-08-24 18:21:13 -05:00
Johannes Doerfert	c5488c8dcc	[OpenMP] Properly set static thread limit (w/o analysis) We used to have two separate implementations to derive the number of threads used in a target region. This lead us to sometimes miss out on user provided thread bounds (num_threads, or thread_limit) when we looked for "constant default values". If we might miss out on the presence of those bounds, we cannot set the thread_limit statically since the runtime will try to honor user input rather than cap it at the "preferred default". This patch replaces the secondary implementation with the primary in a mode that will not emit code but just look for the presence, and potentially upper bounds, of thread limiting clauses. The runtime test would not pass without this rewrite as we missed some clauses, set the static limit on the device to the preferred value, but then violated that value at runtime. Fixes: https://github.com/llvm/llvm-project/issues/64845 Differential Revision: https://reviews.llvm.org/D158381	2023-08-23 11:12:03 -07:00
Jonas Hahnfeld	2f3fe3ed97	[CodeGen] Remove Constant arguments from linkage functions, NFCI. This was unused since commit dd2362a8ba last year. Differential Revision: https://reviews.llvm.org/D156891	2023-08-17 08:28:51 +02:00
Akash Banerjee	5d9ccd7a96	[OpenMP] Migrate dispatch related utility functions from Clang codegen to OMPIRBuilder Migrate createForStaticInitFunction, createDispatchInitFunction, createDispatchNextFunction and createDispatchFiniFunction from Clang CodeGen to OMPIRBuilder. Differential Revision: https://reviews.llvm.org/D157994	2023-08-16 16:35:28 +01:00
Hao Jin	c384e79675	[OpenMP][IR] Set correct alignment for internal variables OpenMP runtime functions assume the pointers are aligned to sizeof(pointer), but it is being aligned incorrectly. Fix with the proper alignment in the IR builder. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D157040	2023-08-11 15:20:15 -04:00
Joseph Huber	141c4e7a94	[OpenMP] Do not always emit unused extern variables Currently, the precense of the OpenMP target declare metadata requires that we always codegen a global declaration. This is undesirable in the case that we could defer or omit this declaration as is common with unused extern variables. This is important as it allows us, in the runtime, to rely on static linking semantics to omit unused symbols so they are not included when the user links it in. This patch changes the check for always emitting these variables. Because of this we also need to extend this logic to the generation of the offloading entries. This has the result of derring the offload entry generation to the canonical definitoin. So we are effectively assuming whoever owns the storage for this variable will perform that operation. This makes an exception for `link` attributes as those require their own special handling. Let me know if this is sound in the implementation, I do not have the largest view of the standards here. Fixes: https://github.com/llvm/llvm-project/issues/64133 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D156368	2023-07-28 11:52:05 -05:00
Johannes Doerfert	08a220764b	Reapply "[OpenMP] Add the `ompx_attribute` clause for target directives" This reverts commit 0d12683046ca75fb08e285f4622f2af5c82609dc and reapplies ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2 with an extension to fix the Flang build. Differential Revision: https://reviews.llvm.org/D156184	2023-07-25 10:40:35 -07:00
Aaron Ballman	0d12683046	Revert "[OpenMP] Add the `ompx_attribute` clause for target directives" This reverts commit ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2. The changes broke several bots: https://lab.llvm.org/buildbot/#/builders/176/builds/3408 https://lab.llvm.org/buildbot/#/builders/198/builds/4028 https://lab.llvm.org/buildbot/#/builders/197/builds/8491 https://lab.llvm.org/buildbot/#/builders/197/builds/8491	2023-07-25 07:57:36 -04:00
Johannes Doerfert	ef9ec4bbcc	[OpenMP] Add the `ompx_attribute` clause for target directives CUDA and HIP have kernel attributes to tune the code generation (in the backend). To reuse this functionality for OpenMP target regions we introduce the `ompx_attribute` clause that takes these kernel attributes and emits code as if they had been attached to the kernel fuction (which is implicitly generated). To limit the impact, we only support three kernel attributes: `amdgpu_waves_per_eu`, for AMDGPU `amdgpu_flat_work_group_size`, for AMDGPU `launch_bounds`, for NVPTX The existing implementations of those attributes are used for error checking and code generation. `ompx_attribute` can be attached to any executable target region and it can hold more than one kernel attribute. Differential Revision: https://reviews.llvm.org/D156184	2023-07-24 22:04:45 -07:00
Akash Banerjee	227012cbd7	[OpenMP] Migrate device code privatisation from Clang CodeGen to OMPIRBuilder This patch migrates the UseDevicePtr and UseDeviceAddr clause related code for handling privatisation from Clang codegen to the OMPIRBuilder Depends on D150860 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D152554	2023-07-12 12:03:28 +01:00

1 2 3 4 5 ...

877 Commits