llvm-project

Author	SHA1	Message	Date
Anchu Rajendran S	49ccf46adc	[OpenMP] [IR Builder] Changes to Support Scan Operation (#136035 ) Scan reductions are supported in OpenMP with the help of scan directive. Reduction clause of the for loop/simd directive can take an `inscan` modifier along with the body of the directive specifying a `scan` directive. This PR implements the lowering logic for scan reductions in workshare loops of OpenMP. The body of the for loop is split into two loops (Input phase loop and Scan Phase loop) and a scan reduction loop is added in the middle. The Input phase loop populates a temporary buffer with initial values that are to be reduced. The buffer is used by the reduction loop to perform scan reduction. Scan phase loop copies the values of the buffer to the reduction variable before executing the scan phase. Below is a high level view of the code generated. ``` <declare pointer to buffer> ptr omp parallel { size num_iters = <num_iters> // temp buffer allocation omp masked { buff = malloc(num_itersscanvarstype) ptr = buff } barrier; // input phase loop for (i: 0..<num_iters>) { <input phase>; buffer = ptr; buffer[i] = red; } // scan reduction omp masked { for (int k = 0; k != ceil(log2(num_iters)); ++k) { i=pow(2,k) for (size cnt = last_iter; cnt >= i; --cnt) { buffer = ptr; buffer[cnt] op= buffer[cnt-i]; } } } barrier; // scan phase loop for (0..<num_iters>) { buffer = ptr; red = buffer[i] ; <scan phase>; } // temp buffer deletion omp masked { free(ptr) } barrier; } ``` The temporary buffer needs to be shared between all threads performing reduction since it is read/written in Input and Scan workshare Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread. This is the reason why allocation, deallocation and scan reduction are performed within `masked`. The code is verified to produce correct results for Fortran programs with the code changes in the PR https://github.com/llvm/llvm-project/pull/133149	2025-08-07 14:58:11 -07:00
Jeremy Morse	5b8c15c6e7	[DebugInfo] Remove getPrevNonDebugInstruction (#148859 ) With the advent of intrinsic-less debug-info, we no longer need to scatter calls to getPrevNonDebugInstruction around the codebase. Remove most of them -- there are one or two that have the "SkipPseudoOp" flag turned on, however they don't seem to be in positions where skipping anything would be reasonable.	2025-07-16 11:41:32 +01:00
Jeremy Morse	57a5f9c47e	[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383 ) There are no longer debug-info instructions, thus we don't need this skipping. Horray!	2025-07-15 15:34:10 +01:00
Tom Eccles	a1c61ac756	[mlir][OpenMP] Allow composite SIMD REDUCTION and IF (#147568 ) Reduction support: https://github.com/llvm/llvm-project/pull/146671 If Support is fixed in this PR The problem for the IF clause in composite constructs was that wsloop and simd both operate on the same CanonicalLoopInfo structure: with the SIMD processed first, followed by the wsloop. Previously the IF clause generated code like ``` if (cond) { while (...) { simd_loop_body; } } else { while (...) { nonsimd_loop_body; } } ``` The problem with this is that this invalidates the CanonicalLoopInfo structure to be processed by the wsloop later. To avoid this, in this patch I preserve the original loop, moving the IF clause inside of the loop: ``` while (...) { if (cond) { simd_loop_body; } else { non_simd_loop_body; } } ``` On simple examples I tried LLVM was able to hoist the if condition outside of the loop at -O3. The disadvantage of this is that we cannot add the llvm.loop.vectorize.enable attribute on either the SIMD or non-SIMD loops because they both share a loop back edge. There's no way of solving this without keeping the old design of having two different loops: which cannot be represented using only one CanonicalLoopInfo structure. I don't think the presence or absence of this attribute makes much difference. In my testing it is the llvm.loop.parallel_access metadata which makes the difference to vectorization. LLVM will vectorize if legal whether or not this attribute is there in the TRUE branch. In the FALSE branch this means the loop might be vectorized even when the condition is false: but I think this is still standards compliant: OpenMP 6.0 says that when the if clause is false that should be treated like the SIMDLEN clause is one. The SIMDLEN clause is defined as a "hint". For the same reason, SIMDLEN and SAFELEN clauses are silently ignored when SIMD IF is used. I think it is better to implement SIMD IF and ignore SIMDLEN and SAFELEN and some vectorization encouragement metadata when combined with IF than to ignore IF because IF could have correctness consequences whereas the rest are optimiztion hints. For example, the user might use the IF clause to disable SIMD programatically when it is known not safe to vectorize the loop. In this case it is not at all safe to add the parallel access or SAFELEN metadata.	2025-07-15 10:30:02 +01:00
Alexander Richardson	07e2ba445d	[AMDGPU] Set AS8 address width to 48 bits Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419	2025-05-19 17:26:05 -07:00
Shafik Yaghmour	c248903053	[OpenMP][NFC] Use pass by const ref for Dependencies (#139592 ) Static analysis flagged the passing of Dependencies to emitTargetCall as a place we could use std::move to avoid copying. A closer look indicated we could instead turn the parameter into a const & and not have a default value since it was only used in two lines in a test and changing those two locations was easy.	2025-05-13 09:09:37 -07:00
Kazu Hirata	2f3067ed69	[llvm] Remove unused local variables (NFC) (#138454 )	2025-05-04 09:38:16 -07:00
NimishMishra	b62afbccc8	[mlir][OpenMP] Add __atomic_store to AtomicInfo (#121055 ) This PR adds functionality for `__atomic_store` libcall in AtomicInfo. This allows for supporting complex types in `atomic write`. Fixes https://github.com/llvm/llvm-project/issues/113479 Fixes https://github.com/llvm/llvm-project/issues/115652	2025-04-29 07:53:36 -07:00
Matt Arsenault	34e7809397	unittests: Avoid using getNumUses (#136352 )	2025-04-18 23:17:26 +02:00
NimishMishra	53fa92dcad	[mlir][llvm][OpenMP] Hoist __atomic_load alloca (#132888 ) Current implementation of `__atomic_compare_exchange` uses an alloca for `__atomic_load`, leading to issues like https://github.com/llvm/llvm-project/issues/120724. This PR hoists this alloca to `AllocaIP`. Fixes: https://github.com/llvm/llvm-project/issues/120724	2025-04-09 03:01:44 -07:00
Jan Leyonberg	fbc8335311	[MLIR][OpenMP] Add codegen for teams reductions (#133310 ) This patch adds the lowering of teams reductions from the omp dialect to LLVM-IR. Some minor cleanup was done in clang to remove an unused parameter.	2025-04-07 12:47:16 -04:00
Sergio Afonso	56975b4ecd	[OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC (#127820 ) This patch splits off the calculation of canonical loop trip counts from the creation of canonical loops. This makes it possible to reuse this logic to, for instance, populate the `__tgt_target_kernel` runtime call for SPMD kernels. This feature is used to simplify one of the existing OpenMPIRBuilder tests.	2025-02-25 10:32:54 +00:00
Akash Banerjee	785a5b4676	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746 ) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005	2025-02-18 17:55:48 +00:00
Abid Qadeer	5f7acf7259	[flang][OMPIRbuilder] Set debug loc on terminator created by splitBB. (#125897 ) Fixes #125088. When splitBB is called with createBranch=true, it creates a branch instruction in the old block. But no debug loc is set on that branch instruction. If that is used as InsertPoint in the restoreIP, it has the potential to set the current debug location to null and subsequent instruction will come out without a debug location. This caused the verification check to fail as shown in the bug report. This PR changes splitBB and spliceBB function to also take a debugLoc parameter which can be used to set the debug location of the branch instruction.	2025-02-05 22:35:43 +00:00
Abid Qadeer	e151b1d1f6	[MLIR][OpenMP] Use correct DebugLoc in target construct callbacks. (#125856 ) This is same as PR #125106 which somehow is stuck in a "Processing Update" loop for many hours now. I am going to close that one and push this one instead. While working on https://github.com/llvm/llvm-project/issues/125088, I noticed a problem with the TargetBodyGenCallbackTy and TargetGenArgAccessorsCallbackTy. The OMPIRBuilder and MLIR side Both maintain their own IRBuilder and when control goes from one to other, we have to take care to not use a stale debug location. The code currently rely on restoreIP to set the insertion point and the debug location. But if the passes InsertPointTy has an empty block, then the debug location will not be updated (see SetInsertPoint). This can cause invalid debug location to be attached to instruction and the verifier will complain. Similarly when we exit the callback, the debug location of the Builder is not set to what it was before the callback. This again can cause verification failures. This PR resets the debug location at the start and also uses an InsertPointGuard to restore the debug location at exit. Both of these problems would have been caught by the unit tests but they were not setting the debug location of the builder before calling the createTarget so the problem was hidden. I have updated the tests accordingly.	2025-02-05 14:59:37 +00:00
Jeremy Morse	81d18ad864	[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction's as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction's and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.	2025-01-27 16:27:54 +00:00
Alex MacLean	07ed8187ac	[OpenMP] Replace nvvm.annotation usage with kernel calling conventions (#122320 ) Specifying a kernel with the `ptx_kernel` or `amdgpu_kernel` calling convention is a more idiomatic and compile-time performant than using the `nvvm.annoation !"kernel"` metadata. Transition OMPIRBuilder to use calling conventions for PTX kernels and no longer emit `nvvm.annoation`. Update OpenMPOpt to work with kernels specified via calling convention as well as metadata. Update OpenMP tests to use the calling conventions.	2025-01-24 16:56:10 -08:00
Jeremy Morse	6292a808b3	[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to getFirstNonPHI use the iterator-returning version. This patch changes a bunch of call-sites calling getFirstNonPHI to use getFirstNonPHIIt, which returns an iterator. All these call sites are where it's obviously safe to fetch the iterator then dereference it. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer getFirstNonPHI, but not before adding concise documentation of what considerations are needed (very few). --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com>	2025-01-24 13:27:56 +00:00
Mats Jun Larsen	7bb949ec61	[IR][unittests] Replace of PointerType::getUnqual(Type) with opaque version (NFC) (#123901 ) Follow up to https://github.com/llvm/llvm-project/issues/123569	2025-01-22 18:02:51 +09:00
Sergio Afonso	9bc8828093	[OMPIRBuilder][MLIR] Add support for target 'if' clause (#122478 ) This patch implements support for handling the 'if' clause of OpenMP 'target' constructs in the OMPIRBuilder and updates MLIR to LLVM IR translation of the `omp.target` MLIR operation to make use of this new feature.	2025-01-15 10:16:19 +00:00
Sergio Afonso	d0b641b7e2	[OMPIRBuilder] Propagate attributes to outlined target regions (#117875 ) This patch copies the target-cpu and target-features attributes of functions containing target regions into the corresponding outlined function holding the target region. This mirrors what is currently being done for all other outlined functions through the `CodeExtractor` in `OpenMPIRBuilder::finalize()`.	2025-01-14 12:35:50 +00:00
Sergio Afonso	fabc443e93	[OMPIRBuilder] Support runtime number of teams and threads, and SPMD mode (#116051 ) This patch introduces a `TargetKernelRuntimeAttrs` structure to hold host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count values passed to the runtime kernel offloading call. Additionally, kernel type information is used to influence target device code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which provides more granularity.	2025-01-14 12:34:37 +00:00
Sergio Afonso	27bc6bdaba	[OMPIRBuilder] Introduce struct to hold default kernel teams/threads (#116050 ) This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs` structure used to simplify passing default and constant values for number of teams and threads, and possibly other target kernel-related information in the future. This is used to forward values passed to `createTarget` to `createTargetInit`, which previously used a default unrelated set of values.	2025-01-14 11:08:55 +00:00
Sergio Afonso	b79ed8729b	[OpenMP][OMPIRBuilder] Handle non-failing calls properly (#115863 ) The preprocessor definition used to enable asserts and the one that `llvm::Error` and `llvm::Expected` use to ensure all created instances are checked are not the same. By making these checks inside of an `assert` in cases where errors are not expected, certain build configurations would trigger runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF -DLLVM_UNREACHABLE_OPTIMIZE=ON`). The `llvm::cantFail()` function, which was intended for this use case, is used by this patch in place of `assert` to prevent these runtime failures. In tests, new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and `EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release builds.	2025-01-09 10:28:16 +00:00
Kaviya Rajendiran	d3eb65f15d	[MLIR][OpenMP] Lowering aligned clause to LLVM IR for SIMD directive (#119536 ) This patch, - Added a translation support for aligned clause in SIMD directive by passing the alignment details to "llvm.assume" intrinsic. - Updated the insertion point for llvm.assume intrinsic call in "OMPIRBuilder.cpp". - Added a check in aligned clause MLIR lowering, to ensure that the alignment value must be a power of 2.	2025-01-03 16:22:38 +05:30
Kareem Ergawy	f9734b9df1	[mlir][OpenMP] - MLIR to LLVMIR translation support for delayed privatization of allocatables in `omp.target` ops (#116576 ) This PR adds support to translate the `private` clause from MLIR to LLVMIR when used on allocatables in the context of an `omp.target` op. This replaces https://github.com/llvm/llvm-project/pull/113208. Parent PR: https://github.com/llvm/llvm-project/pull/116770. Only the latest commit is relevant to the PR.	2024-12-12 14:39:58 +01:00
Sergio Afonso	d87964de78	[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533 ) This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected` objects to replace the values returned by callbacks passed to `OMPIRBuilder` codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace. This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.	2024-10-25 11:30:16 +01:00
Youngsuk Kim	2a65f081b6	[llvm][OpenMPIRBuilderTest] Avoid Type::getPointerTo() (NFC) (#111196 ) `llvm::Type::getPointerTo()` is to be deprecated & removed soon.	2024-10-04 16:28:08 -04:00
Jay Foad	eb6e7e8f89	[unittests] Use {} instead of std::nullopt to initialize empty ArrayRef (#109388 ) Follow up to #109133.	2024-09-21 10:59:50 +01:00
Abid Qadeer	9e08db796b	[OpenMPIRBuilder] Don't drop debug info for target region. (#80692 ) When an outlined function is generated for omp target region, a corresponding DISubprogram was not being generated. This resulted in all the debug information for the target region being dropped. This commit adds DISubprogram for the outlined function if there is one available for the parent function. It also updates the current debug location so that the right scope is used for the entries in the outlined function. There are places in the OpenMPIRBuilder which changes insertion point but don't update the debug location accordingly. They cause issue when debug info is enabled. I have fixed a few that I observed to cause issue. But there may be more and a systematic cleanup may be required. With this change in place, I can set source line breakpoint in target region and run to them in debugger.	2024-09-04 10:16:14 +01:00
Sergio Afonso	84b1e59580	[MLIR][OpenMP][OMPIRBuilder] Add lowering support for omp.target_triples (#100156 ) This patch modifies MLIR to LLVM IR lowering of the OpenMP dialect to take into consideration the contents of the `omp.target_triples` module attribute while generating code for `omp.target` operations. It adds the `OpenMPIRBuilderConfig::TargetTriples` field and initializes it using the `amendOperation` flow of the `OpenMPToLLVMIRTranslation` pass. Some changes are introduced into the `OpenMPIRBuilder` to allow passing the information about whether a target region is intended to be offloaded from outside. The result of this change is that offloading calls are only generated when the `--offload-arch` or `-fopenmp-targets` options are given to the compiler. Otherwise, only the host fallback code is generated. This fixes linker errors currently triggered by `flang-new` if a source file containing a `target` construct is compiled without any of the aforementioned options. Several unit tests impacted by these changes, which are intended to check host code generated for `omp.target` operations, are updated to contain the new attribute. Without it, no calls to `__tgt_target_kernel` and associated control flow operations are generated. Fixes #100209.	2024-08-02 11:58:40 +01:00
Pranav Bhandarkar	5b4e5f8ac6	[OpenMPIRBuilder][Clang][NFC] - Combine `emitOffloadingArrays` and `emitOffloadingArraysArgument` in OpenMPIRBuilder (#97088 ) This patch introduces a new interface in `OpenMPIRBuilder` that combines the creation of the so-called offloading pointer arrays and their subsequent preparation as arguments to the OpenMP runtime library. We then use this in Clang. This is intended to be used in the near future by other frontends such as Flang when lowering MLIR to LLVMIR.	2024-07-25 16:28:11 -05:00
Akash Banerjee	6b1c51bc05	[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343 ) This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions. Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>	2024-06-26 20:18:38 +01:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Tom Eccles	74a87548e5	[flang][MLIR][OpenMP] make reduction by-ref toggled per variable (#92244 ) Fixes #88935 Toggling reduction by-ref broke when multiple reduction clauses were used. Decisions made for the by-ref status for later clauses could then invalidate decisions for earlier clauses. For example, ``` reduction(+:scalar,scalar2) reduction(+:array) ``` The first clause would choose by value reduction and generate by-value reduction regions, but then after this the second clause would force by-ref to support the array argument. But by the time the second clause is processed, the first clause has already had the wrong kind of reduction regions generated. This is solved by toggling whether a variable should be reduced by reference per variable. In the above example, this allows only `array` to be reduced by ref.	2024-05-16 15:27:59 +01:00
Sergio Afonso	3eb0ba34b0	[MLIR][Flang][OpenMP] Make omp.simdloop into a loop wrapper (#87365 ) This patch updates the definition of `omp.simdloop` to enforce the restrictions of a wrapper operation. It has been renamed to `omp.simd`, to better reflect the naming used in the spec. All uses of "simdloop" in function names have been updated accordingly. Some changes to Flang lowering and OpenMP to LLVM IR translation are introduced to prevent the introduction of compilation/test failures. The eventual long term solution might be different.	2024-04-17 11:28:30 +01:00
Joseph Huber	470aefb240	[Offload][NFC] Remove `omp_` prefix from offloading entries (#88071 ) Summary: These entires are generic for offloading with the new driver now. Having the `omp` prefix was a historical artifact and is confusing when used for CUDA. This patch just renames them for now, future patches will rework the binary format to make it more common.	2024-04-09 15:50:15 -05:00
Akash Banerjee	e9da5f0083	[OpenMP] Fix target data region codegen being omitted for device pass (#85218 ) This patch enables the BodyCodeGen callback to still trigger for the TargetData nested region during the device pass. There maybe Target code nested within the TargetData region for which this is required. Also add tests for the same.	2024-03-19 13:04:23 +00:00
Leandro Lupori	64422cf826	[llvm][mlir][OMPIRBuilder] Translate omp.single's copyprivate (#80488 ) Use the new copyprivate list from omp.single to emit calls to __kmpc_copyprivate, during the creation of the single operation in OMPIRBuilder. This is patch 4 of 4, to add support for COPYPRIVATE in Flang. Original PR: https://github.com/llvm/llvm-project/pull/73128	2024-02-28 13:33:42 -03:00
agozillon	dcf4ca558c	[OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (#78818 ) This patch seeks to add a mechanism to raise constant (not ConstantExpr or runtime/dynamic) sized allocations into the entry block for select functions that have been inserted into a list for processing. This processing occurs during the finalize call, after OutlinedInfo regions have completed. This currently has only been utilised for createOutlinedFunction, which is triggered for TargetOp generation in the OpenMP MLIR dialect lowering to LLVM-IR. This currently is required for Target kernels generated by createOutlinedFunction to avoid subsequent optimization passes doing some unintentional malformed optimizations for AMD kernels (unsure if it occurs for other vendors). If the allocas are generated inside of the kernel and are not in the entry block and are subsequently passed to a function this can lead to required instructions being erased or manipulated in a way that causes the kernel to run into a HSA access error. This fix is related to a series of problems found in: https://github.com/llvm/llvm-project/issues/74603 This problem primarily presents itself for Flang's HLFIR AssignOp currently, when utilised with a scalar temporary constant on the RHS and a descriptor type on the LHS. It will generate a call to a runtime function, wrap the RHS temporary in a newly allocated descriptor (an llvm struct), and pass both the LHS and RHS descriptor into the runtime function call. This will currently be embedded into the middle of the target region in the user entry block, which means the allocas are also embedded in the middle, which seems to pose issues when later passes are executed. This issue may present itself in other HLFIR operations or unrelated operations that generate allocas as a by product, but for the moment, this one test case is the only scenario I've found this problem. Perhaps this is not the appropriate fix, I am very open to other suggestions, I've tried a few others (at varying levels of the flang/mlir compiler flow), but this one is the smallest and least intrusive change set. The other two, that come to mind (but I've not fully looked into, the former I tried a little with blocks but it had a few issues I'd need to think through): - Having a proper alloca only block (or region) generated for TargetOps that we could merge into the entry block that's generated by convertTarget's createOutlinedFunction. - Or diverging a little from Clang's current target generation and using the CodeExtractor to generate the user code as an outlined function region invoked from the kernel we make, with our kernel arguments passed into it. Similar to the current parallel generation. I am not sure how well this would intermingle with the existing parallel generation though that's layered in. Both of these methods seem like quite a divergence from the current status quo, which I am not entirely sure is merited for the small test this change aims to fix.	2024-02-23 22:59:41 +01:00
Joseph Huber	cc374d8056	[OpenMP] Remove `register_requires` global constructor (#80460 ) Summary: Currently, OpenMP handles the `omp requires` clause by emitting a global constructor into the runtime for every translation unit that requires it. However, this is not a great solution because it prevents us from having a defined order in which the runtime is accessed and used. This patch changes the approach to no longer use global constructors, but to instead group the flag with the other offloading entires that we already handle. This has the effect of still registering each flag per requires TU, but now we have a single constructor that handles everything. This function removes support for the old `__tgt_register_requires` and replaces it with a warning message. We just had a recent release, and the OpenMP policy for the past four releases since we switched to LLVM is that we do not provide strict backwards compatibility between major LLVM releases now that the library is versioned. This means that a user will need to recompile if they have an old binary that relied on `register_requires` having the old behavior. It is important that we actively deprecate this, as otherwise it would not solve the problem of having no defined init and shutdown order for `libomptarget`. The problem of `libomptarget` not having a define init and shutdown order cascades into a lot of other issues so I have a strong incentive to be rid of it. It is worth noting that the current `__tgt_offload_entry` only has space for a 32-bit integer here. I am planning to overhaul these at some point as well.	2024-02-21 11:33:32 -06:00
Kazu Hirata	5c9d82de6b	[llvm] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 22:46:02 -08:00
Dominik Adamski	bb4484d41e	[OpenMPIRBuilder] Add support for target workshare loops (#73360 ) The workshare loop for target region uses the new OpenMP device runtime. The code generation scheme for the new device runtime is presented below: Input code: ``` workshare-loop { loop-body } ``` Output code: helper function which represents loop body: ``` function-loop-body(counter, loop-body-args) { loop-body } ``` workshare-loop is replaced by the proper device runtime call: ``` call __kmpc_new_worksharing_rtl(function-loop-body, loop-body-args, loop-tripcount, ...) ``` This PR uses the new device runtime functions which were added in PR: https://github.com/llvm/llvm-project/pull/73225	2023-12-06 09:47:09 +01:00
Fangrui Song	dd3184c30f	[unittest,examples] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC	2023-11-27 08:29:13 -08:00
Paulo Matos	7b9d73c2f9	[NFC] Remove Type::getInt8PtrTy (#71029 ) Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.	2023-11-07 17:26:26 +01:00
Dominik Adamski	2cce0f6c57	[OpenMP][OMPIRBuilder] Add support to omp target parallel (#67000 ) Added support for LLVM IR code generation which is used for handling omp target parallel code. The call for __kmpc_parallel_51 is generated and the parallel region is outlined to separate function. The proper setup of kmpc_target_init mode is not included in the commit. It is assumed that the SPMD mode for target initialization is properly set by other codegen functions.	2023-11-06 11:44:00 +01:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Shraiysh	9922aadf9e	[OpenMPIRBuilder] Added `if` clause for `teams` (#69139 ) This patch adds support for the `if` clause on `teams` construct. The value of the argument must be an integer value. If the value evaluates to true (non-zero) integer, then the number of threads is determined by `num_threads` clause (or default and ICV if `num_threads` is absent). When the condition evaluates to false (zero), then the bounds are set to 1. ([OpenMP 5.2 Section 10.2](https://www.openmp.org/spec-html/5.2/openmpse58.html)) This essentially means that ``` upperbound = ifexpr ? upperbound : 1 lowerbound = ifexpr ? lowerbound : 1 ```	2023-10-17 15:00:39 -05:00
Shraiysh	e41eaf4896	[OpenMPIRBuilder] Add ThreadLimit and NumTeams clauses to teams construct (#68364 ) This patch adds support for `thread_limit` and bounds on `num_teams` clause for the teams construct in OpenMP. Added testcases for the same.	2023-10-11 10:36:03 -05:00

1 2 3 4

191 Commits