llvm-project

Author	SHA1	Message	Date
Sergio Afonso	d87964de78	[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533 ) This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected` objects to replace the values returned by callbacks passed to `OMPIRBuilder` codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace. This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.	2024-10-25 11:30:16 +01:00
Kareem Ergawy	ad70f3e095	[flang][OpenMP] Support `target enter\|update\|exit .. nowait` (#113305 ) Extends `nowait` support for other device directives. This PR refactors the task generation utils used for the `target` directive so that they are general enough to be reused for other device directives as well.	2024-10-23 10:48:54 +02:00
NimishMishra	645e6f1114	[llvm][OpenMP] Handle complex types in atomic read (#111377 ) This patch adds functionality for atomically reading `llvm.struct` types. Fixes: https://github.com/llvm/llvm-project/issues/93441	2024-10-22 18:34:22 -07:00
Kareem Ergawy	d0d03805f8	[flang][OpenMP] Support `target ... nowait` (#111823 ) Adds MLIR to LLVM lowering support for `target ... nowait`. This leverages the already existings code-gen patterns for `task` by treating `target ... nowait` as `task ... if(1)` and `target` (without `nowait`) as `task ... if(0)`; similar to what clang does.	2024-10-15 14:39:16 +02:00
NimishMishra	aec87a2143	[llvm][mlir][flang][OpenMP] Emit __atomic_load and __atomic_compare_exchange libcalls for complex types in atomic update (#92364 ) This patch adds functionality to emit relevant libcalls in case atomicrmw instruction can not be emitted (for instance, in case of complex types). The IRBuilder is modified to directly emit __atomic_load and __atomic_compare_exchange libcalls. The added functions follow a similar codegen path as Clang, so that LLVM Flang generates almost similar IR as Clang. Fixes https://github.com/llvm/llvm-project/issues/83760 and https://github.com/llvm/llvm-project/issues/75138 Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>	2024-10-02 23:32:36 -07:00
Youngsuk Kim	d071fdab44	[llvm][OMPIRBuilder] Avoid Type::getPointerTo() (NFC) (#110678 ) `Type::getPointerTo()` is to be deprecated & removed soon.	2024-10-01 11:47:08 -04:00
Nikita Popov	ecb98f9fed	[IRBuilder] Remove uses of CreateGlobalStringPtr() (NFC) Since the migration to opaque pointers, CreateGlobalStringPtr() is the same as CreateGlobalString(). Normalize to the latter.	2024-09-23 16:30:50 +02:00
Jay Foad	e03f427196	[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133 ) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.	2024-09-19 16:16:38 +01:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Akash Banerjee	2cf36f0293	[OpenMP]Update use_device_clause lowering (#101707 ) This patch updates the use_device_ptr and use_device_addr clauses to use the mapInfoOps for lowering. This allows all the types that are handle by the map clauses such as derived types to also be supported by the use_device_clauses. This is patch 2/2 in a series of patches.	2024-09-04 12:36:03 +01:00
Abid Qadeer	9e08db796b	[OpenMPIRBuilder] Don't drop debug info for target region. (#80692 ) When an outlined function is generated for omp target region, a corresponding DISubprogram was not being generated. This resulted in all the debug information for the target region being dropped. This commit adds DISubprogram for the outlined function if there is one available for the parent function. It also updates the current debug location so that the right scope is used for the entries in the outlined function. There are places in the OpenMPIRBuilder which changes insertion point but don't update the debug location accordingly. They cause issue when debug info is enabled. I have fixed a few that I observed to cause issue. But there may be more and a systematic cleanup may be required. With this change in place, I can set source line breakpoint in target region and run to them in debugger.	2024-09-04 10:16:14 +01:00
Kareem Ergawy	a195e2d461	[MLIR][OpenMP] Handle privatization for global values in MLIR->LLVM translation (#104407 ) Potential fix for https://github.com/llvm/llvm-project/issues/102939 and https://github.com/llvm/llvm-project/issues/102949. The issues occurs because the CodeExtractor component only collect inputs (to the parallel regions) that are defined in the same function in which the parallel regions is present. Howerver, this is problematic because if we are privatizing a global value (e.g. a `target` variable which is emitted as a global), then we miss finding that input and we do not privatize the variable. This commit attempts to fix the issue by adding a flag to the CodeExtractor so that we can collect global inputs.	2024-08-26 17:08:24 +02:00
Daniil Fukalov	0da2ba811a	[NFC] Cleanup in ADT and Analysis headers. (#104484 ) Remove unused directly includes and forward declarations in ADT and Analysis headers.	2024-08-17 13:11:18 +02:00
Shilei Tian	0551926fda	[Clang][OMPX] Add the code generation for multi-dim `thread_limit` clause (#102717 )	2024-08-16 13:59:46 -04:00
Kazu Hirata	b57038a611	[OpenMP] Use range-based for loops (NFC) (#103511 )	2024-08-14 20:03:45 -07:00
Nikita Popov	dc831e8422	[OMPIRBuilder] Use getAllOnesValue() Split out from https://github.com/llvm/llvm-project/pull/80309.	2024-08-12 16:37:42 +02:00
Shilei Tian	ee8100ba02	[Clang][OMPX] Add the code generation for multi-dim `num_teams` (#101407 ) This patch adds the code generation support for multi-dim `num_teams` clause when it is used with `target teams ompx_bare` construct.	2024-08-09 10:33:41 -04:00
arsnyder16	f7b2c2e49f	[openmp][WebAssembly] Allow openmp to compile and run under emscripten toolchain (#95169 ) * Separate wasi and emscripten as they have different constraints and abilities * Emscripten mimics Linux/POSIX by statically linking the musl runtime. This allow nearly all KMP_OS_LINUX code paths to work correctly. There are only a few places that need to be adjusted related to dynamic linking (dl_open) * Internally link openmp globals * With CommonLinkage it is needed to emit them in an assembly file, now they are defined and used within each compilation unit * With ExternalLinkage they suffer from duplicate symbols during linking for unnamed globals like reduction/critical * Interestingly this aligns with the TODO comment above this code	2024-08-07 13:00:37 -05:00
Sergio Afonso	84b1e59580	[MLIR][OpenMP][OMPIRBuilder] Add lowering support for omp.target_triples (#100156 ) This patch modifies MLIR to LLVM IR lowering of the OpenMP dialect to take into consideration the contents of the `omp.target_triples` module attribute while generating code for `omp.target` operations. It adds the `OpenMPIRBuilderConfig::TargetTriples` field and initializes it using the `amendOperation` flow of the `OpenMPToLLVMIRTranslation` pass. Some changes are introduced into the `OpenMPIRBuilder` to allow passing the information about whether a target region is intended to be offloaded from outside. The result of this change is that offloading calls are only generated when the `--offload-arch` or `-fopenmp-targets` options are given to the compiler. Otherwise, only the host fallback code is generated. This fixes linker errors currently triggered by `flang-new` if a source file containing a `target` construct is compiled without any of the aforementioned options. Several unit tests impacted by these changes, which are intended to check host code generated for `omp.target` operations, are updated to contain the new attribute. Without it, no calls to `__tgt_target_kernel` and associated control flow operations are generated. Fixes #100209.	2024-08-02 11:58:40 +01:00
Johannes Doerfert	f3bfc56327	[Offload][OpenMP] Prettify error messages by "demangling" the kernel name (#101400 ) The kernel names for OpenMP are manually mangled and not ideal when we report something to the user. We demangle them now, providing the function and line number of the target region, together with the actual kernel name.	2024-08-01 15:24:15 -07:00
Pranav Bhandarkar	5b4e5f8ac6	[OpenMPIRBuilder][Clang][NFC] - Combine `emitOffloadingArrays` and `emitOffloadingArraysArgument` in OpenMPIRBuilder (#97088 ) This patch introduces a new interface in `OpenMPIRBuilder` that combines the creation of the so-called offloading pointer arrays and their subsequent preparation as arguments to the OpenMP runtime library. We then use this in Clang. This is intended to be used in the near future by other frontends such as Flang when lowering MLIR to LLVMIR.	2024-07-25 16:28:11 -05:00
Akash Banerjee	0613454012	[OpenMP] Fix OpenMPIRBuilder generating incorrect duplicate SrcLocInfo (#100364 ) This should further fix some of the incorrect debug info being generated related to #97458	2024-07-25 17:46:18 +01:00
Johannes Doerfert	3c8efd7928	[OpenMP] Ensure the actual kernel is annotated with launch bounds (#99927 ) In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function.	2024-07-23 09:02:47 -07:00
Pranav Bhandarkar	d7e185cca9	[OMPIRBuilder] - Handle dependencies in `createTarget` (#93977 ) This patch handles dependencies specified by the `depend` clause on an OpenMP target construct. It does this much the same way clang does it by materializing an OpenMP `task` that is tagged with the dependencies. The following functions are relevant to this patch - 1) `createTarget` - This function itself is largely unchanged except that it now accepts a vector of `DependData` objects that it simply forwards to `emitTargetCall` 2) `emitTargetCall` - This function has changed now to check if an outer target-task needs to be materialized (i.e if `target` construct has `nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to do all the heavy lifting for creating and dispatching the task. 3) `emitTargetTask` - Bulk of the change is here. See the large comment explaining what it does at the beginning of this function	2024-07-22 10:56:45 -05:00
Gheorghe-Teodor Bercea	1a478a69bc	[OpenMP][offload] Fix dynamic schedule tracking (#97065 ) This patch fixes the dynamic schedule tracking.	2024-07-01 10:23:11 -04:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Akash Banerjee	6b1c51bc05	[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343 ) This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions. Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>	2024-06-26 20:18:38 +01:00
Nikita Popov	b6a94b6bfb	[OMPIRBuilder] Use SmallPtrSet::remove_if() (NFC)	2024-06-26 14:43:01 +02:00
agozillon	aec735cf47	[Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829 ) This PR attempts to fix common block mapping for regular mapping of these types as well as when they have been marked as "declare target link". This PR should allow correct mapping of both the members of a common block and the full common block via its block symbol. The main changes were some adjustments to the Fortran OpenMP lowering to HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and adjustments to the way the we handle target kernel map argument rebinding inside of the OMPIRBuilder. For the Fortran OpenMP lowering were two changes, one to prevent the implicit capture of common block members when the common block symbol itself has been marked and the other creates intermediate member access inside of the target region to be used in-place of those external to the target region, this prevents external usages breaking the IsolatedFromAbove pact. In the latter case, there was an adjustment to the size calculation for types to better handle cases where we pass an array as the type of a map (as opposed to the bounds and the type of the element), which occurs in the case of common blocks. There is also some adjustment to how handleDeclareTargetMapVar handles renaming of declare target symbols in the module to the reference pointer, now it will only apply to those within the kernel that is currently being generated and we also perform a modification to replace constants with instructions as necessary as we cannot replace these with our reference pointer (non-constant and constants do not mix nicely). In the case of the OpenMPIRBuilder some changes were made to defer global symbol rebinding to kernel arguments until all other arguments have been rebound. This makes sure we do not replace uses that may refer to the global (e.g. a GEP) but are themselves actually a separate argument that needs bound. Currently "declare target to" still needs some work, but this may be the case for all types in conjunction with "declare target to" at the moment.	2024-06-25 20:54:04 +02:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Nikita Popov	f1075a34ab	[FileSystem] Avoid <stack> include (NFC) The standard pattern in LLVM is to directly use vectors for stacks, without an additional std::stack wrapper to rename some methods.	2024-06-21 13:44:46 +02:00
Nikita Popov	36c6632eb4	[IR] Don't include PassInstrumentation.h in PassManager.h (NFC) (#96219 ) Move PassInstrumentationAnalysis into PassInstrumentation.h and stop including it in PassManager.h (effectively inverting the direction of the dependency). Most places using PassManager are not interested in PassInstrumentation, and we no longer have any uses of it in PassManager.h itself (only in PassManagerImpl.h).	2024-06-21 08:41:16 +02:00
Mats Petersson	e5f1639342	[Flang]Fix for changed code at the end of AllocaIP. (#92430 ) Some of the OpenMP code can change the instruction pointed at by the insertion point. This leads to an assert in the compiler about BB->getParent() and IP->getParent() not matching. The fix is to rebuild the insertionpoint from the block, rather than use builder.restoreIP. Also, move some of the alloca generation, rather than skipping back and forth between insert points (and ensure all the allocas are done before their users are created). A simple test, mainly to ensure the minimal reproducer doesn't fail to compile in the future is also added.	2024-06-18 21:10:41 +01:00
agozillon	0aeaa2d93d	[OMPIRBuilder][OpenMP][LLVM] Modify and use ReplaceConstant utility in convertTarget (#94541 ) This PR seeks to expand/replace the Constant -> Instruction conversion that needs to occur inside of the OpenMP Target kernel generation to allow kernel argument replacement of uses within the kernel (cannot replace constant uses within constant expressions with non-constants). It does so by making use of the new-ish utility convertUsersOfConstantsToInstructions which is a much more expansive version of what the smaller "version" of the function I wrote does, effectively expanding uses of the input argument that are constant expressions into instructions so that we can replace with the appropriate kernel argument. Also alters convertUsersOfConstantsToInstructions to optionally restrict the replacement to a function and optionally leave dead constants alone, the latter is necessary when lowering from MLIR as we cannot be sure we can remove the constants at this stage, even if rewritten to instructions the ModuleTranslation may maintain links to the original constants and utilise them in further lowering steps (as when we're lowering the kernel, the module is still in the process of being lowered). This can result in unusual ICEs later. These dead constants can be tidied up later (and appear to be in subsequent lowering from checking with emit-llvm).	2024-06-13 15:57:15 +02:00
Sameer Sahasrabuddhe	e0ac087ff0	[LoopUnroll] Consider convergence control tokens when unrolling (#91715 ) - There is no restriction on a loop with controlled convergent operations when the relevant tokens are defined and used within the loop. - When a token defined outside a loop is used inside (also called a loop convergence heart), unrolling is allowed only in the absence of remainder or runtime checks. - When a token defined inside a loop is used outside, such a loop is said to be "extended". This loop can only be unrolled by also duplicating the extended part lying outside the loop. Such unrolling is disabled for now. - Clean up loop hearts: When unrolling a loop with a heart, duplicating the heart will introduce multiple static uses of a convergence control token in a cycle that does not contain its definition. This violates the static rules for tokens, and needs to be cleaned up into a single occurrence of the intrinsic. - Spell out the initializer for UnrollLoopOptions to improve readability. Original implementation [D85605] by Nicolai Haehnle <nicolai.haehnle@amd.com>.	2024-06-06 13:13:46 +05:30
Kareem Ergawy	7db4e6c1ec	[OpenMP][LLVM] Update alloca IP after `PrivCB` in `OMPIRBUIlder` (#93920 ) Fixes a crash uncovered by [pr77666.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr77666.f90) in the test suite when delayed privatization is enabled by default. In particular, whenever `PrivCB` (the callback responsible for generating privatizaiton logic for an OMP variable) generates a multi-block privatization region, the insertion point diverges: the BB component of the IP can become a different BB from the parent block of the instruction iterator component of the IP. This PR updates the IP to make sure that the BB is the parent block of the instruction iterator.	2024-06-05 05:13:47 +02:00
Tom Eccles	74a87548e5	[flang][MLIR][OpenMP] make reduction by-ref toggled per variable (#92244 ) Fixes #88935 Toggling reduction by-ref broke when multiple reduction clauses were used. Decisions made for the by-ref status for later clauses could then invalidate decisions for earlier clauses. For example, ``` reduction(+:scalar,scalar2) reduction(+:array) ``` The first clause would choose by value reduction and generate by-value reduction regions, but then after this the second clause would force by-ref to support the array argument. But by the time the second clause is processed, the first clause has already had the wrong kind of reduction regions generated. This is solved by toggling whether a variable should be reduced by reference per variable. In the above example, this allows only `array` to be reduced by ref.	2024-05-16 15:27:59 +01:00
Pranav Bhandarkar	13cd88108f	[mlir][OpenMP] - Honor dependencies in code-generation of the if clause in `omp.task` correctly (#90891 ) This patch fixes the code generation of the if clause, specifically when the condition evaluates to false and when the task directive has the depend clause on it. When the if clause of a task construct evaluates to false, then the task is an undeferred task. This undeferred task still has to honor dependencies. Previously, the OpenMPIRbuilder didn't honor dependencies. This patch fixes that. Fixes https://github.com/llvm/llvm-project/issues/90869	2024-05-13 08:54:23 -05:00
Abid Qadeer	6419496549	[flang][OMPIRBuilder] Keep debug location in sync with insert point. (#89953 ) A customer reported an issue which I have reduced to the test in the PR. If built with debug info enabled, the build fails with the following error in the verifier. !dbg attachment points at wrong subprogram for function The problem happened because some of the functions in OMPIRBuilder.cpp updated the insertion point with the passed in location but did not change the current debug location. This caused a stale debug location to be attached to the instruction. I have solved it by replacing restoreIP with updateToLocation which updates both the insertion point and debug location. The updateToLocation is used in many places already, so this PR brings functions that I have changed in line with rest of the file. Slight issue is that I am not checking the return type of updateToLocation as there is no good value I could return in that case. But if we have a condition where updateToLocation will return false, these functions will fail in any case. I have added a test that checks that build does not fail. I was not sure what is the correct location for the test should be. Happy to move it to more appropriate location.	2024-05-10 10:12:24 +01:00
Kareem Ergawy	922ab7089b	[MLIR][OpenMP] Extend omp.private materialization support: `dealloc` (#90841 ) Extends current support for delayed privatization during translation to LLVM IR. This adds support for materlizaing the `dealloc` region in `omp.private` ops when this region contains clean-up/deallocation logic that needs to be executed at the end of the parallel region. This changes the `OMPIRBuilder` slightly to execute the finalization callback after the privatization callback. This allows us to collect information about privatized variables on the MLIR and LLVM sides so that we can properly emit deallocation logic.	2024-05-03 08:59:01 +02:00
Joseph Huber	0287a5cc4e	[OpenMP] Remove 'minncta' attributes from NVPTX kernels (#88398 ) Summary: Currently we treat this attribute as a minimum number for the amount of blocks scheduled on the kernel. However, the doucmentation states that this applies to CTA's mapped onto a single SM. Currently we just set it to the total number of blocks, which will almost always result in a warning that the value is out of range and will be ignored. We don't have a good way to automatically know how many CTAs can be put on a single SM nor if we should do this, so we should probably leave this up to users manually adding it. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm	2024-04-15 15:28:05 -05:00
Joseph Huber	2650375b3b	[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (#87695 ) Summary: This new attribute was introduced recently. We already do this for NVPTX kernels so we should apply this for AMDGPU as well. This patch simply applies this metadata in cases where a lower bound is known	2024-04-05 07:38:01 -05:00
Stephen Tozer	9a96fb4445	Reapply "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737 )" Fixes a build error caused by an unupdated getAsInstruction callsite in clang. This reverts commit ab851f7fe946e7eed700ef9d82082eb721860189.	2024-03-19 15:49:10 +00:00
Stephen Tozer	ab851f7fe9	Revert "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737 )" Reverted due to buildbot failures: https://lab.llvm.org/buildbot/#/builders/139/builds/61717/ This reverts commit 7ef433f62c199c414bffdcac1c8ee3159b29c5f5.	2024-03-19 14:41:27 +00:00
Jeremy Morse	7ef433f62c	[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737 ) Because the RemoveDIs work is putting a debug-info bit into BasicBlock::iterator and iterators are needed for insertion, the getAsInstruction method declaration would need to use a fully defined instruction-iterator, which leads to a complicated header-inclusion-order problem. Much simpler to instead just not insert, and make it the callers problem to insert. This is proportionate because there are only four call-sites to getAsInstruction -- it would suck if we did this everywhere. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>	2024-03-19 14:30:41 +00:00
Akash Banerjee	e9da5f0083	[OpenMP] Fix target data region codegen being omitted for device pass (#85218 ) This patch enables the BodyCodeGen callback to still trigger for the TargetData nested region during the device pass. There maybe Target code nested within the TargetData region for which this is required. Also add tests for the same.	2024-03-19 13:04:23 +00:00
Tom Eccles	f46f5a01f4	[flang][OpenMP][OMPIRBuilder][mlir] Optionally pass reduction vars by ref (#84304 ) Previously reduction variables were always passed by value into and out of the initialization and combiner regions of the OpenMP reduction declare operation. This worked well for reductions of primitive types (and might perform better than passing by reference). But passing by reference will be useful for array and derived type reductions (e.g. to move allocation inside of the init region). Passing reductions by reference requires different LLVM-IR generation when lowering from MLIR because some of the loads/stores/allocations will now be moved inside of the init and combiner regions. This alternate code generation is requested using a new attribute to omp.wsloop and omp.parallel. Existing lowerings from mlir are unaffected (these will continue to use the by-value argument passing. Flang will continue to pass by-value argument passing for trivial types unless a (hidden) command line argument is supplied. Non-trivial types will always use the by-ref lowering. Array reductions are not ready yet (but are coming very soon). In the meantime, this is tested by forcing existing reductions to use by-ref. Commit series for by-ref OpenMP reductions 3/3 --------- Co-authored-by: Mats Petersson <mats.petersson@arm.com>	2024-03-13 14:51:09 +00:00
Leandro Lupori	64422cf826	[llvm][mlir][OMPIRBuilder] Translate omp.single's copyprivate (#80488 ) Use the new copyprivate list from omp.single to emit calls to __kmpc_copyprivate, during the creation of the single operation in OMPIRBuilder. This is patch 4 of 4, to add support for COPYPRIVATE in Flang. Original PR: https://github.com/llvm/llvm-project/pull/73128	2024-02-28 13:33:42 -03:00
agozillon	dcf4ca558c	[OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (#78818 ) This patch seeks to add a mechanism to raise constant (not ConstantExpr or runtime/dynamic) sized allocations into the entry block for select functions that have been inserted into a list for processing. This processing occurs during the finalize call, after OutlinedInfo regions have completed. This currently has only been utilised for createOutlinedFunction, which is triggered for TargetOp generation in the OpenMP MLIR dialect lowering to LLVM-IR. This currently is required for Target kernels generated by createOutlinedFunction to avoid subsequent optimization passes doing some unintentional malformed optimizations for AMD kernels (unsure if it occurs for other vendors). If the allocas are generated inside of the kernel and are not in the entry block and are subsequently passed to a function this can lead to required instructions being erased or manipulated in a way that causes the kernel to run into a HSA access error. This fix is related to a series of problems found in: https://github.com/llvm/llvm-project/issues/74603 This problem primarily presents itself for Flang's HLFIR AssignOp currently, when utilised with a scalar temporary constant on the RHS and a descriptor type on the LHS. It will generate a call to a runtime function, wrap the RHS temporary in a newly allocated descriptor (an llvm struct), and pass both the LHS and RHS descriptor into the runtime function call. This will currently be embedded into the middle of the target region in the user entry block, which means the allocas are also embedded in the middle, which seems to pose issues when later passes are executed. This issue may present itself in other HLFIR operations or unrelated operations that generate allocas as a by product, but for the moment, this one test case is the only scenario I've found this problem. Perhaps this is not the appropriate fix, I am very open to other suggestions, I've tried a few others (at varying levels of the flang/mlir compiler flow), but this one is the smallest and least intrusive change set. The other two, that come to mind (but I've not fully looked into, the former I tried a little with blocks but it had a few issues I'd need to think through): - Having a proper alloca only block (or region) generated for TargetOps that we could merge into the entry block that's generated by convertTarget's createOutlinedFunction. - Or diverging a little from Clang's current target generation and using the CodeExtractor to generate the user code as an outlined function region invoked from the kernel we make, with our kernel arguments passed into it. Similar to the current parallel generation. I am not sure how well this would intermingle with the existing parallel generation though that's layered in. Both of these methods seem like quite a divergence from the current status quo, which I am not entirely sure is merited for the small test this change aims to fix.	2024-02-23 22:59:41 +01:00

1 2 3 4 5 ...

313 Commits