396 Commits

Author SHA1 Message Date
arsnyder16
f7b2c2e49f
[openmp][WebAssembly] Allow openmp to compile and run under emscripten toolchain (#95169)
* Separate wasi and emscripten as they have different constraints and
abilities
* Emscripten mimics Linux/POSIX by statically linking the musl runtime.
This allow nearly all KMP_OS_LINUX code paths to work correctly. There
are only a few places that need to be adjusted related to dynamic
linking (dl_open)
* Internally link openmp globals
* With CommonLinkage it is needed to emit them in an assembly file, now
they are defined and used within each compilation unit
* With ExternalLinkage they suffer from duplicate symbols during linking
for unnamed globals like reduction/critical
   * Interestingly this aligns with the TODO comment above this code
2024-08-07 13:00:37 -05:00
Sergio Afonso
84b1e59580
[MLIR][OpenMP][OMPIRBuilder] Add lowering support for omp.target_triples (#100156)
This patch modifies MLIR to LLVM IR lowering of the OpenMP dialect to take into
consideration the contents of the `omp.target_triples` module attribute while
generating code for `omp.target` operations.

It adds the `OpenMPIRBuilderConfig::TargetTriples` field and initializes it
using the `amendOperation` flow of the `OpenMPToLLVMIRTranslation` pass. Some
changes are introduced into the `OpenMPIRBuilder` to allow passing the
information about whether a target region is intended to be offloaded from
outside.

The result of this change is that offloading calls are only generated when the
`--offload-arch` or `-fopenmp-targets` options are given to the compiler.
Otherwise, only the host fallback code is generated. This fixes linker errors
currently triggered by `flang-new` if a source file containing a `target`
construct is compiled without any of the aforementioned options.

Several unit tests impacted by these changes, which are intended to check host
code generated for `omp.target` operations, are updated to contain the new
attribute. Without it, no calls to `__tgt_target_kernel` and associated control
flow operations are generated.

Fixes #100209.
2024-08-02 11:58:40 +01:00
Johannes Doerfert
f3bfc56327
[Offload][OpenMP] Prettify error messages by "demangling" the kernel name (#101400)
The kernel names for OpenMP are manually mangled and not ideal when we
report something to the user. We demangle them now, providing the
function and line number of the target region, together with the actual
kernel name.
2024-08-01 15:24:15 -07:00
Pranav Bhandarkar
5b4e5f8ac6
[OpenMPIRBuilder][Clang][NFC] - Combine emitOffloadingArrays and emitOffloadingArraysArgument in OpenMPIRBuilder (#97088)
This patch introduces a new interface in `OpenMPIRBuilder` that combines
the creation of the so-called offloading pointer arrays and their
subsequent preparation as arguments to the OpenMP runtime library. We
then use this in Clang. 

This is intended to be used in the near future
by other frontends such as Flang when lowering MLIR to LLVMIR.
2024-07-25 16:28:11 -05:00
Akash Banerjee
0613454012
[OpenMP] Fix OpenMPIRBuilder generating incorrect duplicate SrcLocInfo (#100364)
This should further fix some of the incorrect debug info being generated
related to #97458
2024-07-25 17:46:18 +01:00
Johannes Doerfert
3c8efd7928
[OpenMP] Ensure the actual kernel is annotated with launch bounds (#99927)
In debug mode there is a wrapper (the kernel) around the function in
which we generate the kernel code. We worked around this before to get
the correct kernel name, but now we really distinguish both to attach
the launch bounds to the kernel, not the inner function.
2024-07-23 09:02:47 -07:00
Pranav Bhandarkar
d7e185cca9
[OMPIRBuilder] - Handle dependencies in createTarget (#93977)
This patch handles dependencies specified by the `depend` clause on an
OpenMP target construct. It does this much the same way clang does it by
materializing an OpenMP `task` that is tagged with the dependencies.

The following functions are relevant to this patch -
1) `createTarget` - This function itself is largely unchanged except
that it now accepts a vector of `DependData` objects that it simply
forwards to `emitTargetCall`
2) `emitTargetCall` - This function has changed now to check if an outer
target-task needs to be materialized (i.e if `target` construct has
`nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to
do all the heavy lifting for creating and dispatching the task.
3) `emitTargetTask` - Bulk of the change is here. See the large comment
explaining what it does at the beginning of this function
2024-07-22 10:56:45 -05:00
Gheorghe-Teodor Bercea
1a478a69bc
[OpenMP][offload] Fix dynamic schedule tracking (#97065)
This patch fixes the dynamic schedule tracking.
2024-07-01 10:23:11 -04:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Akash Banerjee
6b1c51bc05
[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343)
This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions.

Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>
2024-06-26 20:18:38 +01:00
Nikita Popov
b6a94b6bfb [OMPIRBuilder] Use SmallPtrSet::remove_if() (NFC) 2024-06-26 14:43:01 +02:00
agozillon
aec735cf47
[Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829)
This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.

The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.

For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.

In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).

In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.

Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.
2024-06-25 20:54:04 +02:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Nikita Popov
f1075a34ab [FileSystem] Avoid <stack> include (NFC)
The standard pattern in LLVM is to directly use vectors for stacks,
without an additional std::stack wrapper to rename some methods.
2024-06-21 13:44:46 +02:00
Nikita Popov
36c6632eb4
[IR] Don't include PassInstrumentation.h in PassManager.h (NFC) (#96219)
Move PassInstrumentationAnalysis into PassInstrumentation.h and stop
including it in PassManager.h (effectively inverting the direction of
the dependency).

Most places using PassManager are not interested in PassInstrumentation,
and we no longer have any uses of it in PassManager.h itself (only in
PassManagerImpl.h).
2024-06-21 08:41:16 +02:00
Mats Petersson
e5f1639342
[Flang]Fix for changed code at the end of AllocaIP. (#92430)
Some of the OpenMP code can change the instruction pointed at by the
insertion point. This leads to an assert in the compiler about
BB->getParent() and IP->getParent() not matching.

The fix is to rebuild the insertionpoint from the block, rather than use
builder.restoreIP.

Also, move some of the alloca generation, rather than skipping back and
forth between insert points (and ensure all the allocas are done before
their users are created).

A simple test, mainly to ensure the minimal reproducer doesn't fail to
compile in the future is also added.
2024-06-18 21:10:41 +01:00
agozillon
0aeaa2d93d
[OMPIRBuilder][OpenMP][LLVM] Modify and use ReplaceConstant utility in convertTarget (#94541)
This PR seeks to expand/replace the Constant -> Instruction conversion
that needs to occur inside of the OpenMP Target kernel generation to
allow kernel argument replacement of uses within the kernel (cannot
replace constant uses within constant expressions with non-constants).
It does so by making use of the new-ish utility
convertUsersOfConstantsToInstructions which is a much more expansive
version of what the smaller "version" of the function I wrote does,
effectively expanding uses of the input argument that are constant
expressions into instructions so that we can replace with the
appropriate kernel argument.

Also alters convertUsersOfConstantsToInstructions to optionally
restrict the replacement to a function and optionally leave
dead constants alone, the latter is necessary when lowering from 
MLIR as we cannot be sure we can remove the constants at this 
stage, even if rewritten to instructions the ModuleTranslation may 
maintain links to the original constants and utilise them in further 
lowering steps (as when we're lowering the kernel, the module is 
still in the process of being lowered). This can result in unusual 
ICEs later. These dead constants can be tidied up later (and 
appear to be in subsequent lowering from checking with 
emit-llvm).
2024-06-13 15:57:15 +02:00
Sameer Sahasrabuddhe
e0ac087ff0
[LoopUnroll] Consider convergence control tokens when unrolling (#91715)
- There is no restriction on a loop with controlled convergent
operations when
  the relevant tokens are defined and used within the loop.

- When a token defined outside a loop is used inside (also called a loop
convergence heart), unrolling is allowed only in the absence of
remainder or
  runtime checks.

- When a token defined inside a loop is used outside, such a loop is
said to be
"extended". This loop can only be unrolled by also duplicating the
extended part
  lying outside the loop. Such unrolling is disabled for now.

- Clean up loop hearts: When unrolling a loop with a heart, duplicating
the
heart will introduce multiple static uses of a convergence control token
in a
cycle that does not contain its definition. This violates the static
rules for
tokens, and needs to be cleaned up into a single occurrence of the
intrinsic.

- Spell out the initializer for UnrollLoopOptions to improve
readability.


Original implementation [D85605] by Nicolai Haehnle
<nicolai.haehnle@amd.com>.
2024-06-06 13:13:46 +05:30
Kareem Ergawy
7db4e6c1ec
[OpenMP][LLVM] Update alloca IP after PrivCB in OMPIRBUIlder (#93920)
Fixes a crash uncovered by
[pr77666.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr77666.f90)
in the test suite when delayed privatization is enabled by default.

In particular, whenever `PrivCB` (the callback responsible for
generating privatizaiton logic for an OMP variable) generates a
multi-block privatization region, the insertion point diverges: the BB
component of the IP can become a different BB from the parent block of
the instruction iterator component of the IP. This PR updates the IP to
make sure that the BB is the parent block of the instruction iterator.
2024-06-05 05:13:47 +02:00
Tom Eccles
74a87548e5
[flang][MLIR][OpenMP] make reduction by-ref toggled per variable (#92244)
Fixes #88935

Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,

```
reduction(+:scalar,scalar2) reduction(+:array)
```

The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.

This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.
2024-05-16 15:27:59 +01:00
Pranav Bhandarkar
13cd88108f
[mlir][OpenMP] - Honor dependencies in code-generation of the if clause in omp.task correctly (#90891)
This patch fixes the code generation of the if clause, specifically when the
condition evaluates to false and when the task directive has the depend
clause on it. When the if clause of a task construct evaluates to false,
then the task is an undeferred task. This undeferred task still has to
honor dependencies. Previously, the OpenMPIRbuilder didn't honor
dependencies. This patch fixes that.

Fixes https://github.com/llvm/llvm-project/issues/90869
2024-05-13 08:54:23 -05:00
Abid Qadeer
6419496549
[flang][OMPIRBuilder] Keep debug location in sync with insert point. (#89953)
A customer reported an issue which I have reduced to the test in the PR.
If built with debug info enabled, the build fails with the following
error in the verifier.

!dbg attachment points at wrong subprogram for function

The problem happened because some of the functions in OMPIRBuilder.cpp
updated the insertion point with the passed in location but did not
change the current debug location. This caused a stale debug location to
be attached to the instruction.

I have solved it by replacing restoreIP with updateToLocation which
updates both the insertion point and debug location. The
updateToLocation is used in many places already, so this PR brings
functions that I have changed in line with rest of the file.

Slight issue is that I am not checking the return type of
updateToLocation as there is no good value I could return in that case.
But if we have a condition where updateToLocation will return false,
these functions will fail in any case.

I have added a test that checks that build does not fail. I was not sure
what is the correct location for the test should be. Happy to move it to
more appropriate location.
2024-05-10 10:12:24 +01:00
Kareem Ergawy
922ab7089b
[MLIR][OpenMP] Extend omp.private materialization support: dealloc (#90841)
Extends current support for delayed privatization during translation to
LLVM IR. This adds support for materlizaing the `dealloc` region in
`omp.private` ops when this region contains clean-up/deallocation logic
that needs to be executed at the end of the parallel region.

This changes the `OMPIRBuilder` slightly to execute the finalization
callback **after** the privatization callback. This allows us to collect
information about privatized variables on the MLIR and LLVM sides so
that we can properly emit deallocation logic.
2024-05-03 08:59:01 +02:00
Joseph Huber
0287a5cc4e
[OpenMP] Remove 'minncta' attributes from NVPTX kernels (#88398)
Summary:
Currently we treat this attribute as a minimum number for the amount of
blocks scheduled on the kernel. However, the doucmentation states that
this applies to CTA's mapped onto a *single* SM. Currently we just set
it to the total number of blocks, which will almost always result in a
warning that the value is out of range and will be ignored. We don't
have a good way to automatically know how many CTAs can be put on a
single SM nor if we should do this, so we should probably leave this up
to users manually adding it.


https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm
2024-04-15 15:28:05 -05:00
Joseph Huber
2650375b3b
[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels (#87695)
Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known
2024-04-05 07:38:01 -05:00
Stephen Tozer
9a96fb4445 Reapply "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737)"
Fixes a build error caused by an unupdated getAsInstruction callsite in clang.

This reverts commit ab851f7fe946e7eed700ef9d82082eb721860189.
2024-03-19 15:49:10 +00:00
Stephen Tozer
ab851f7fe9 Revert "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737)"
Reverted due to buildbot failures:
https://lab.llvm.org/buildbot/#/builders/139/builds/61717/

This reverts commit 7ef433f62c199c414bffdcac1c8ee3159b29c5f5.
2024-03-19 14:41:27 +00:00
Jeremy Morse
7ef433f62c
[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not insert (#84737)
Because the RemoveDIs work is putting a debug-info bit into
BasicBlock::iterator and iterators are needed for insertion, the
getAsInstruction method declaration would need to use a fully defined
instruction-iterator, which leads to a complicated
header-inclusion-order problem. Much simpler to instead just not insert,
and make it the callers problem to insert.

This is proportionate because there are only four call-sites to
getAsInstruction -- it would suck if we did this everywhere.

---------

Merged by: Stephen Tozer <stephen.tozer@sony.com>
2024-03-19 14:30:41 +00:00
Akash Banerjee
e9da5f0083
[OpenMP] Fix target data region codegen being omitted for device pass (#85218)
This patch enables the BodyCodeGen callback to still trigger for the
TargetData nested region during the device pass. There maybe Target code
nested within the TargetData region for which this is required.

Also add tests for the same.
2024-03-19 13:04:23 +00:00
Tom Eccles
f46f5a01f4
[flang][OpenMP][OMPIRBuilder][mlir] Optionally pass reduction vars by ref (#84304)
Previously reduction variables were always passed by value into and out
of the initialization and combiner regions of the OpenMP reduction
declare operation.

This worked well for reductions of primitive types (and might perform
better than passing by reference). But passing by reference will be
useful for array and derived type reductions (e.g. to move allocation
inside of the init region).

Passing reductions by reference requires different LLVM-IR generation
when lowering from MLIR because some of the loads/stores/allocations
will now be moved inside of the init and combiner regions. This
alternate code generation is requested using a new attribute to
omp.wsloop and omp.parallel.

Existing lowerings from mlir are unaffected (these will continue to use
the by-value argument passing.

Flang will continue to pass by-value argument passing for trivial types
unless a (hidden) command line argument is supplied. Non-trivial types
will always use the by-ref lowering.

Array reductions are not ready yet (but are coming very soon). In the
meantime, this is tested by forcing existing reductions to use by-ref.

Commit series for by-ref OpenMP reductions 3/3

---------

Co-authored-by: Mats Petersson <mats.petersson@arm.com>
2024-03-13 14:51:09 +00:00
Leandro Lupori
64422cf826
[llvm][mlir][OMPIRBuilder] Translate omp.single's copyprivate (#80488)
Use the new copyprivate list from omp.single to emit calls to
__kmpc_copyprivate, during the creation of the single operation
in OMPIRBuilder.

This is patch 4 of 4, to add support for COPYPRIVATE in Flang.
Original PR: https://github.com/llvm/llvm-project/pull/73128
2024-02-28 13:33:42 -03:00
agozillon
dcf4ca558c
[OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (#78818)
This patch seeks to add a mechanism to raise constant (not ConstantExpr
or runtime/dynamic) sized allocations into the entry block for select
functions that have been inserted into a list for processing. This
processing occurs during the finalize call, after OutlinedInfo regions
have completed. This currently has only been utilised for
createOutlinedFunction, which is triggered for TargetOp generation in
the OpenMP MLIR dialect lowering to LLVM-IR.

This currently is required for Target kernels generated by
createOutlinedFunction to avoid subsequent optimization passes doing
some unintentional malformed optimizations for AMD kernels (unsure if it
occurs for other vendors). If the allocas are generated inside of the
kernel and are not in the entry block and are subsequently passed to a
function this can lead to required instructions being erased or
manipulated in a way that causes the kernel to run into a HSA access
error.

This fix is related to a series of problems found in:
https://github.com/llvm/llvm-project/issues/74603

This problem primarily presents itself for Flang's HLFIR AssignOp
currently, when utilised with a scalar temporary constant on the RHS and
a descriptor type on the LHS. It will generate a call to a runtime
function, wrap the RHS temporary in a newly allocated descriptor (an
llvm struct), and pass both the LHS and RHS descriptor into the runtime
function call. This will currently be
embedded into the middle of the target region in the user entry block,
which means the allocas are also embedded in the middle, which seems to
pose
issues when later passes are executed. This issue may present itself in
other HLFIR operations or unrelated operations that generate allocas as
a by product, but for the moment, this one test case is the only
scenario I've found this problem.

Perhaps this is not the appropriate fix, I am very open to other
suggestions, I've tried a few others (at varying levels of the
flang/mlir compiler flow), but this one is the smallest and least
intrusive change set. The other two, that come to mind (but I've not
fully looked into, the former I tried a little with blocks but it had a
few issues I'd need to think through):

- Having a proper alloca only block (or region) generated for TargetOps
that we could merge into the entry block that's generated by
convertTarget's createOutlinedFunction.
- Or diverging a little from Clang's current target generation and using
the CodeExtractor to generate the user code as an outlined function
region invoked from the kernel we make, with our kernel arguments passed
into it. Similar to the current parallel generation. I am not sure how
well this would intermingle with the existing parallel generation though
that's layered in.

Both of these methods seem like quite a divergence from the current
status quo, which I am not entirely sure is merited for the small test
this change aims to fix.
2024-02-23 22:59:41 +01:00
Joseph Huber
cc374d8056
[OpenMP] Remove register_requires global constructor (#80460)
Summary:
Currently, OpenMP handles the `omp requires` clause by emitting a global
constructor into the runtime for every translation unit that requires
it. However, this is not a great solution because it prevents us from
having a defined order in which the runtime is accessed and used.

This patch changes the approach to no longer use global constructors,
but to instead group the flag with the other offloading entires that we
already handle. This has the effect of still registering each flag per
requires TU, but now we have a single constructor that handles
everything.

This function removes support for the old `__tgt_register_requires` and
replaces it with a warning message. We just had a recent release, and
the OpenMP policy for the past four releases since we switched to LLVM
is that we do not provide strict backwards compatibility between major
LLVM releases now that the library is versioned. This means that a user
will need to recompile if they have an old binary that relied on
`register_requires` having the old behavior. It is important that we
actively deprecate this, as otherwise it would not solve the problem of
having no defined init and shutdown order for `libomptarget`. The
problem of `libomptarget` not having a define init and shutdown order
cascades into a lot of other issues so I have a strong incentive to be
rid of it.

It is worth noting that the current `__tgt_offload_entry` only has space
for a 32-bit integer here. I am planning to overhaul these at some point
as well.
2024-02-21 11:33:32 -06:00
Sergio Afonso
bc82d1a6b7
[OpenMPIRBuilder][MLIR] Pass target-cpu and target-features to outlined functions (#80283)
This patch adds support for forwarding the target-cpu and
target-features attributes to functions outlined in the OpenMPIRBuilder.
This, in turn, results in the addition of these attributes for functions
created during the translation of the `omp.parallel`, `omp.task` and
`omp.teams` operations, and for the `omp.wsloop` operation when doing
codegen for an OpenMP target device.
2024-02-05 12:22:56 +00:00
Dominik Adamski
b4370140b4
[OpenMPIRBuilder] Do not call host runtime for GPU teams codegen (#79984)
Patch ensures that host runtime functions are not called for handling
OpenMP teams clause on the device.

GPU code for pragma `omp target teams distribute parallel do` will
require only one call to OpenMP loop-worksharing GPU runtime. Support
for it will be added later.

This patch does not include changes required for handling `omp target
teams` for the host side.
2024-01-31 12:16:35 +01:00
Dominik Adamski
21199f9842
[OpenMP][OMPIRBuilder] Fix LLVM IR codegen for collapsed device loop (#78708)
When we generate the loop body function, we need to be sure, that all
original loop counters are replaced by the new counter.

We need to save all items which use the original loop counter and then
perform replacement of the original loop counter. If we don't do it,
there is a risk that some values are not updated.
2024-01-22 09:24:45 +01:00
SunilKuravinakop
49ee8b53ef
[OpenMP] atomic compare fail : Codegen support (#75709)
This is a continuation of https://reviews.llvm.org/D123235 ([OpenMP]
atomic compare fail : Parser & AST support). In this branch Support for
codegen support for atomic compare fail is being added.

---------

Co-authored-by: Sunil Kuravinakop
2024-01-02 22:46:02 +05:30
Andrew Brown
68ea91dd8b
[openmp][wasm] Allow compiling OpenMP to WebAssembly (#71297)
This change allows building the static OpenMP runtime, `libomp.a`, as
WebAssembly. It builds on the work done in [D142593] but goes further in
several ways:
 - it makes the OpenMP CMake files more WebAssembly-aware
- it conditions much more code (or code that had been refactored since
[D142593]) for `KMP_ARCH_WASM` and `KMP_OS_WASI`
- it fixes a Clang crash due to unimplemented common symbols in
WebAssembly

The commit messages have more details. Please understand this PR as a
start, not the completed work, for WebAssembly support in OpenMP.
Getting the tests running somehow would be a good next step, e.g.; but
what is contained here works, at least with recent versions of
[wasi-sdk] and engines that support [wasi-threads]. I suspect the same
is true for Emscripten and browsers, but I have not tested that
workflow.

[D142593]: https://reviews.llvm.org/D142593
[wasi-sdk]: https://github.com/WebAssembly/wasi-sdk
[wasi-threads]: https://github.com/WebAssembly/wasi-threads

---------

Co-authored-by: Atanas Atanasov <atanas.atanasov@intel.com>
2023-12-14 13:48:01 -06:00
Joseph Huber
97f3be2c5a
[CUDA][HIP] Improve variable registration with the new driver (#73177)
Summary:
This patch adds support for registering texture / surface variables from
CUDA / HIP. Additionally, we now properly track the `extern` and `const`
flags that are also used in these runtime functions.

This does not implement the `managed` variables yet as those seem to
require some extra handling I'm not familiar with. The issue is that the
current offload entry isn't large enough to carry size and alignment
information along with an extra global.
2023-12-07 15:44:23 -06:00
Dominik Adamski
bb4484d41e
[OpenMPIRBuilder] Add support for target workshare loops (#73360)
The workshare loop for target region uses the new OpenMP device runtime.
The code generation scheme for the new device runtime is presented
below:

Input code:
```
workshare-loop {
  loop-body
}
```

Output code:
helper function which represents loop body:
```
function-loop-body(counter, loop-body-args) {
  loop-body
}
```
workshare-loop is replaced by the proper device runtime call:
```
call __kmpc_new_worksharing_rtl(function-loop-body, loop-body-args,
                                                     loop-tripcount, ...)
```
This PR uses the new device runtime functions which were added in PR:
https://github.com/llvm/llvm-project/pull/73225
2023-12-06 09:47:09 +01:00
Jorge Gorbe Moya
ce9b72c979 [NFC] Fix unused variable (used only in assert) after d1cdcddcc2ef712c4e2ab61c6e4ca83350e7e9e3 2023-12-04 14:15:24 -08:00
Youngsuk Kim
d1cdcddcc2 [llvm][OMPIRBuilder] Remove no-op ptr-to-ptr bitcast (NFC)
Opaque ptr cleanup effort
2023-12-04 15:06:07 -06:00
Craig Topper
aba040182a
[IR] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC (#73154) 2023-11-22 12:24:18 -08:00
Mike Rice
3ce5c04ad0
Replace getAs with castAs, dyn_cast with cast (NFC) (#72600)
Make the code clear that nullptrs are not expected.
2023-11-17 09:22:33 -08:00
agozillon
718793ce6a
[OpenMP][OMPIRBuilder] Handle replace uses of ConstantExpr's inside of Target regions (#71891)
Currently there's an edge cases where constant indexing in target
regions can lead to incorrect results as we do not correctly replace
uses of mapped variables in generated target functions with the target
arguments (and accessor instructions) that replace them. This patch
seeks to fix that by extending the current logic in the OMPIRBuilder.

Things like GEP's can come in the form of Constants/ConstantExprs,
Constants and ConstantExpr's do not have access to the knowledge of what
they're contained in, so we must dig a little to find an instruction so
we can tell if they're used inside of the function we're outlining so we
can be sure they are replaceable and we are not accidentally replacing a
usage somewhere else in the module that's still necessary.

This patch handles these by replacing the original constant expression
with a new instruction equivalent; an instruction as it allows easy
modification in the following loop, as we can now know the constant
(instruction) is owned by our target function (as it holds this
knowledge) and replaceUsesOfWith can now be invoked on it (cannot do
this with constants it seems), a brand new one also allows us to be
cautious as it is perhaps possible the old expression was used inside of
the function but exists and is used externally (unlikely by the nature
of a Constant, but still a positive side affect).
2023-11-15 15:45:32 +01:00
Akash Banerjee
767b34297d
[OpenMP] Mute OpenMP Target Enter, Exit and Data codegen for device pass (#72287) 2023-11-15 10:44:16 +00:00
Youngsuk Kim
876236023c
[llvm] Remove no-op ptr-to-ptr bitcasts (NFC) (#72133)
Opaque ptr cleanup effort (NFC).
2023-11-13 13:05:27 -05:00
Dominik Adamski
f2f5f1bfb6
[OMPIRBuilder] Do not call __kmpc_push_num_threads for device parallel (#71934)
Function __kmpc_push_num_threads should be called only if we specify
number of threads for host parallel region.

Number of threads specified by the user should be passed as one of
arguments of __kmpc_parallel_51 function.
2023-11-10 20:38:56 +01:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00