106 Commits

Author SHA1 Message Date
Jan Leyonberg
3e86f05621
[OpenMP][flang] Lowering of OpenMP custom reductions to MLIR (#168417)
This patch add support for lowering of custom reductions to MLIR. It
also enhances the capability of the pass to automatically mark functions
as "declare target" by traversing custom reduction initializers and
combiners.
2025-11-24 16:00:46 -05:00
agozillon
173600880b
[Flang][OpenMP][MLIR] Initial declare target to for variables implementation (#119589)
While the infrastructure for declare target to/enter and link for
variables exists in the MLIR dialect and at the Flang level, the current
lowering from MLIR -> LLVM IR isn't in place, it's only in place for
variables that have the link clause applied.

This PR aims to extend that lowering to an initial implementation that
incorporates declare target to as well, which primarily requires changes
in the OpenMPToLLVMIRTranslation phase. However, a minor addition to the
OpenMP dialect was required to extend the declare target enumerator to
include a default None field as well.

This also requires a minor change to the Flang lowering's
MapInfoFinlization.cpp pass to alter the map type for descriptors to
deal with cases where a variable is marked declare to. Currently, when a
descriptor variable is mapped declare target to the descriptor component
can become attatched, and cannot be updated, this results in issues when
an unusual allocation range is specified (effectively an off-by X
error). The current solution is to map the descriptor always, as we
always require an up-to-date version of this data. However, this also
requires an interlinked PR that adds a more intricate type of mapping of
structures/record types that clang currently implements, to circumvent
the overwriting of the pointer in the descriptor.

3/3 required PRs to enable declare target to mapping, this PR should
pass all tests and provide an all green CI.

Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-11-24 21:22:49 +01:00
agozillon
20929abb85
[MLIR][OpenMP] Introduce overlapped record type map support (#119588)
This PR introduces a new additional type of map lowering for record
types that Clang currently supports, in which a user can map a top-level
record type and then individual members with different mapping,
effectively creating a sort of "overlapping" mapping that we attempt to
cut around.

This is currently most predominantly used in Fortran, when mapping
descriptors and there data, we map the descriptor and its data with
separate map modifiers and "cut around" the pointer data, so that wedo
not overwrite it unless the runtime deems it a neccesary action based on
its reference counting mechanism. However, it is a mechanism that will
come in handy/trigger when a user explitily maps a record type (derived
type or structure) and then explicitly maps a member with a different
map type.

These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp
file and phase, however, one Flang test that checks end-to-end IR
compilation (as far as we care for now at least) was altered.

2/3 required PRs to enable declare target to mapping, should look at PR
3/3 to check for full green passes (this one will fail a number due to
some dependencies).

Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-11-24 21:20:29 +01:00
Akash Banerjee
8aa7d823b0
[OpenMP][Flang] Emit default declare mappers implicitly for derived types (#140562)
This patch adds support to emit default declare mappers for implicit
mapping of derived types when not supplied by user. This especially
helps tackle mapping of allocatables of derived types.
2025-11-14 15:59:48 +00:00
Ethan Luis McDonough
38cade7cc6
[PGO][Offload] Fix missing names bug in GPU PGO (#166444)
After #163011 was merged, the tests in
[`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1)
broke because the offload plugins were no longer able to find
`__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm`
visible to the host on GPU targets and reverses the changes made in
f7e9968a5ba99521e6e51161f789f0cc1745193f.
2025-11-10 10:11:53 -06:00
Joseph Huber
aaddd8d38a [OpenMP] Fix tests relying on the heap size variable
Summary:
I made that an unimplemented error, but forgot that it was used for this
environment variable.
2025-11-06 13:00:26 -06:00
Joseph Huber
670c453aeb
[Offload] Remove handling for device memory pool (#163629)
Summary:
This was a lot of code that was only used for upstream LLVM builds of
AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so
just use that. Simplifies code, can be added back if we start providing
alternate forms but I don't think there's a single use-case that would
justify it yet.
2025-11-06 10:15:18 -06:00
agozillon
09318c6bff
[MLIR][OpenMP] Fix and simplify bounds offset calculation for 1-D GEP offsets (#165486)
Currently this is being calculated incorrectly and will result in
incorrect index offsets in more complicated array slices. This PR tries
to address it by refactoring and changing the calculation to be more
correct.
2025-10-31 00:54:31 +01:00
Jan Patrick Lehr
f7e9968a5b
[Offload] XFAIL pgo tests until resolved (#163722)
While people look into it, xfail the tests.
2025-10-16 11:43:55 +02:00
Joseph Huber
914fbe367e
[OpenMP] Disable a few more tests to get the bot green (#163614) 2025-10-15 14:14:15 -05:00
Jan Patrick Lehr
4b84e0f3f0
[OpenMP] Add test to print interop identifiers (#161434)
The test covers some of the identifier symbols in the interop runtime.

This test, for now, is to guard against complete breakage, which was the
result of the other `interop.c` test not being enabled on AMD and thus,
not caught by our buildbots.
2025-10-15 20:38:33 +02:00
agozillon
9155b318f2
[Flang][OpenMP] Defer descriptor mapping for assumed dummy argument types (#154349)
This PR adds deferral of descriptor maps until they are necessary for
assumed dummy argument types. The intent is to avoid a problem where a
user can inadvertently map a temporary local descriptor to device
without their knowledge and proceed to never unmap it. This temporary
local descriptor remains lodged in OpenMP device memory and the next
time another variable or descriptor residing in the same stack address
is mapped we incur a runtime OpenMP map error as we try to remap the
same address.

This fix was discussed with the OpenMP committee and applies to OpenMP
5.2 and below, future versions of OpenMP can avoid this issue via the
attach semantics added to the specification.
2025-10-09 17:52:41 +02:00
Akash Banerjee
ed12dc5e30
[Flang][OpenMP] Implicitly map nested allocatable components in derived types (#160766)
This PR adds support for nested derived types and their mappers to the
MapInfoFinalization pass.

- Generalize MapInfoFinalization to add child maps for arbitrarily
nested allocatables when a derived object is mapped via declare mapper.
- Traverse HLFIR designates rooted at the target block arg and build
full coordinate_of chains; append members with correct membersIndex.

This fixes #156461.
2025-10-02 16:15:16 +00:00
Joseph Huber
0fcce4fb7b
[OpenMP] Mark problematic tests as XFAIL / UNSUPPORTED (#161267)
Summary:
Several of these tests have been failing for literal years. Ideally we
make efforts to fix this, but keeping these broken has had serious
consequences on our testing infrastructure where failures are the norm
so almost all test failures are disregarded. I made a tracking issue for
the ones that have been disabled.

https://github.com/llvm/llvm-project/issues/161265
2025-09-29 15:17:55 -05:00
Dominik Adamski
e4d94f4f7f
[OpenMP][Flang] Fix no-loop test (#161162)
Fortran no-loop test is supported only for GPU.
2025-09-29 16:01:52 +02:00
Dominik Adamski
83ef38a274
[Flang][OpenMP] Enable no-loop kernels (#155818)
Enable the generation of no-loop kernels for Fortran OpenMP code. target
teams distribute parallel do pragmas can be promoted to no-loop kernels
if the user adds the -fopenmp-assume-teams-oversubscription and
-fopenmp-assume-threads-oversubscription flags.

If the OpenMP kernel contains reduction or num_teams clauses, it is not
promoted to no-loop mode.

The global OpenMP device RTL oversubscription flags no longer force
no-loop code generation for Fortran.
2025-09-26 13:57:51 +02:00
Akash Banerjee
3e7e60ae5c
Revert "[Flang][OpenMP] Implicitly map nested allocatable components in derived types" (#160759)
Reverts llvm/llvm-project#160116
2025-09-25 19:53:58 +01:00
Akash Banerjee
b4f1e0e5b1
[Flang][OpenMP] Implicitly map nested allocatable components in derived types (#160116)
This PR adds support for nested derived types and their mappers to the
MapInfoFinalization pass.

- Generalize MapInfoFinalization to add child maps for arbitrarily
nested allocatables when a derived object is mapped via declare mapper.
- Traverse HLFIR designates rooted at the target block arg and build
full coordinate_of chains; append members with correct membersIndex.
  
This fixes #156461.
2025-09-24 14:30:27 +01:00
Akash Banerjee
8afea0d0ea
[OpenMP][MLIR] Preserve to/from flags in mapper base entry for mappers (#159799)
With declare mapper, the parent base entry was emitted as `TARGET_PARAM`
only. The mapper received a map-type without `to/from`, causing
components to degrade to `alloc`-only (no copies), breaking allocatable
payload mapping. This PR preserves the map-type bits from the parent.

This fixes #156466.
2025-09-19 19:34:09 +01:00
Kareem Ergawy
c286a427b9
[NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping (#155993)
Adds end-to-end tests for `do concurrent` offloading to the device.


PR stack:
- https://github.com/llvm/llvm-project/pull/155754
- https://github.com/llvm/llvm-project/pull/155987
- https://github.com/llvm/llvm-project/pull/155992
- https://github.com/llvm/llvm-project/pull/155993 ◀️
- https://github.com/llvm/llvm-project/pull/157638
- https://github.com/llvm/llvm-project/pull/156610
- https://github.com/llvm/llvm-project/pull/156837
2025-09-17 07:04:13 +02:00
Jan Patrick Lehr
311d78f2a1
[OpenMP] Fix force-usm test after #157182 (#159095)
The refactoring lead to an additional data transfer. This changes the
assumed transfers in the check-strings to work with that changed
behavior.
2025-09-16 15:42:02 +02:00
agozillon
8f16af3c20
[Flang][OpenMP] Fix mapping of character type with LEN > 1 specified (#154172)
Currently, there's a number of issues with mapping characters with LEN's
specified (strings effectively). They're represented as a char type in
FIR with a len parameter, and then later on they're expanded into an
array of characters when we're translating to the LLVM dialect. However,
we don't generate a bounds for these at lowering. The fix in this PR for
this is to generate a bounds from the LEN parameter and attatch it to
the map on lowering from FIR to the LLVM dialect when we encounter this
type.
2025-09-09 16:36:04 +02:00
Jan Patrick Lehr
27e541645c
[Offload][OpenMP] Enable more tests on AMDGPU (#156626)
(Re)enables a couple of tests that were disabled on AMDGPU for some
reason. Pass for me locally.
2025-09-03 14:04:39 +02:00
Akash Banerjee
6aafe6582d Fix test added in 1fd1d634630754cc9b9c4b5526961d5856f64ff9 2025-08-18 13:29:23 +01:00
Akash Banerjee
1fd1d63463 [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-15 15:41:41 +01:00
Akash Banerjee
1c7720ef78 Revert "[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)"
This reverts commit 4e6d510eb3ec5b5e5ea234756ea1f0b283feee4a.
2025-08-12 20:19:45 +01:00
Akash Banerjee
4e6d510eb3
[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-12 15:18:15 +01:00
Amit Tiwari
2074e1320f
[Clang][OpenMP] Non-contiguous strided update (#144635)
This patch handles the strided update in the `#pragma omp target update
from(data[a🅱️c])` directive where 'c' represents the strided access
leading to non-contiguous update in the `data` array when the offloaded
execution returns the control back to host from device using the `from`
clause.

Issue: Clang CodeGen where info is generated for the particular
`MapType` (to, from, etc), it was failing to detect the strided access.
Because of this, the `MapType` bits were incorrect when passed to
runtime. This led to incorrect execution (contiguous) in the
libomptarget runtime code.

Added a minimal testcase that verifies the working of the patch.
2025-08-12 19:32:15 +05:30
Akash Banerjee
0998da27e9 Revert "[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#151989)"
This reverts commit 5a5e8ba0c388d57aecb359ed67919cda429fc7b1.
2025-08-11 13:52:39 +01:00
Akash Banerjee
5a5e8ba0c3
[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#151989)
Add a new `AutomapToTargetData` pass. This gathers the declare target
enter variables which have the `AUTOMAP` modifier. And adds
`omp.declare_target_enter/exit` mapping directives for `fir.allocmem`
and `fir.freemem` oeprations on the `AUTOMAP` enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-11 13:18:38 +01:00
Aiden Grossman
2e3fd547de [Offload] Fix typo in shared_lib_fp_mapping.c
Made a typo in 963259ef6be4871e5252ff3ac9df737af5d2b4cb because I cannot
run tests and also did not review it. This should fix it...
2025-07-25 23:17:46 +00:00
Aiden Grossman
963259ef6b
[Offload] Remove uses of %T from lit tests (#150721)
This patch removes all the instances of %T from offload/ (only one test
contained this construction). %T has been deprecated for ~7 years and is
not reccomended as it does not use a unique directory per test. Switch
to using %t to ensure we use a unique dir per test and so that we can
eventually remove %T.

I did not actually test this. A couple feeble attempts at
building/running the offload tests just leaves me with a ton of test
failures. Given how small this is I'm reasonably sure it works though.
2025-07-25 16:16:22 -07:00
agozillon
73272d6fc6
[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization (#150311)
Currently, we return early whenever we've already generated an
allocation for intermediate descriptor variables (required in certain
cases when we can't directly access the base address of a passes in
descriptor function argument due to HLFIR/FIR restrictions). This
unfortunately, skips over the presence check and load/store required to
set the intermediate descriptor allocations values/data. This is fine in
most cases, but if a function happens to have a series of branches with
seperate target regions capturing the same input argument, we'd emit the
present/load/store into the first branch with the first target inside of
it, the secondary (or any preceding) branches would not have the
present/load/store, this would lead to the subsequent mapped values in
that branch being empty and then leading to a memory access violation on
device.

The fix for the moment is to emit a present/load/store at the relevant
location of every target utilising the input argument, this likely will
also lead to fixing possible issues with the input argument being
manipulated inbetween target regions (primarily resizing, the data
should remain the same as we're just copying an address around, in
theory at least). There's possible optimizations/simplifications to emit
less load/stores such as by raising the load/store out of the branches
when we can, but I'm inclined to leave this sort of optimization to
lower level passes such as an LLVM pass (which very possibly already
covers it).
2025-07-25 16:15:54 +02:00
Ye Luo
9f6784cc1f [libomptarget] fix test offloading/disable_default_device.c
Fixes the incorrect lit command line introduced in 536ba87726d8dea862d964678dbb761ca32e21fb
2025-07-09 09:52:00 -05:00
Ye Luo
536ba87726
[libomptarget] Add a test for OMP_TARGET_OFFLOAD=disabled (#146385)
closes https://github.com/llvm/llvm-project/issues/144786
2025-06-30 13:29:36 -05:00
Ethan Luis McDonough
daee5eee85
[Offload][PGO] Fix new GPU PGO tests (#143645)
`pgo_atomic_teams.c` and `pgo_atomic_threads.c` currently are set to run
on NVPTX despite the changes for that target not being upstreamed yet.
This patch also replaces instances of `llvm-profdata` with `%profdata`
in those tests.
2025-06-12 11:14:21 -05:00
Johannes Doerfert
57a90edacd
[OpenMP][GPU][FIX] Enable generic barriers in single threaded contexts (#140786)
The generic GPU barrier implementation checked if it was the main thread
in generic mode to identify single threaded regions. This doesn't work
since inside of a non-active (=sequential) parallel, that thread becomes
the main thread of a team, and is not the main thread in generic mode.
At least that is the implementation of the APIs today.

To identify single threaded regions we now check the team size
explicitly.

This exposed three other issues; one is, for now, expected and not a
bug, the second one is a bug and has a FIXME in the
single_threaded_for_barrier_hang_1.c file, and the final one is also
benign as described in the end.

The non-bug issue comes up if we ever initialize a thread state.
Afterwards we will never run any region in parallel. This is a little
conservative, but I guess thread states are really bad for performance
anyway.

The bug comes up if we optimize single_threaded_for_barrier_hang_1 and
execute it in Generic-SPMD mode. For some reason we loose all the
updates to b. This looks very much like a compiler bug, but could also
be another logic issue in the runtime. Needs to be investigated.

Issue number 3 comes up if we have nested parallels inside of a target
region. The clang SPMD-check logic gets confused, determines SPMD (which
is fine) but picks an unreasonable thread count. This is all benign, I
think, just weird:

```
  #pragma omp target teams
  #pragma omp parallel num_threads(64)
  #pragma omp parallel num_threads(10)
  {}
```
Was launched with 10 threads, not 64.
2025-05-20 19:33:54 -07:00
Ethan Luis McDonough
1043810769
[PGO][Offload] Update PGO GPU tests (#132262) 2025-05-14 17:17:52 -05:00
agozillon
f687ed9ff7
[Flang][OpenMP] Initial defaultmap implementation (#135226)
This aims to implement most of the initial arguments for defaultmap
aside from firstprivate and none, and some of the more recent OpenMP 6
additions which will come in subsequent updates (with the OpenMP 6
variants needing parsing/semantic support first).
2025-05-12 16:30:43 +02:00
agozillon
b291cfcad4
[Flang][OpenMP] Generate correct present checks for implicit maps of optional allocatables (#138210)
Currently, we do not generate the appropriate checks to check if an
optional
allocatable argument is present before accessing relevant components of
it,
in particular when creating bounds, we must generate a presence check
and we
must make sure we do not generate/keep an load external to the presence
check
by utilising the raw address rather than the regular address of the info
data structure.

Similarly in cases for optional allocatables we must treat them like
non-allocatable
arguments and generate an intermediate allocation that we can have as a
location
in memory that we can access later in the lowering without causing
segfaults when
we perform "mapping" on it, even if the end result is an empty
allocatable
(basically, we shouldn't explode if someone tries to map a non-present
optional,
similar to C++ when mapping null data).
2025-05-09 13:57:45 +02:00
Joseph Huber
92bba68634
[Offload] Fix handling of 'bare' mode when environment missing (#136794)
Summary:
We treated the missing kernel environment as a unique mode, but it was
kind of this random bool that was doing the same thing and it explicitly
expects the kernel environment to be zero. It broke after the previous
change since it used to default to SPMD and didn't handle zero in any of
the other cases despite being used. This fixes that and queries for it
without needing to consume an error.
2025-04-23 08:16:39 -05:00
agozillon
b2c9a58b8f
[Flang][OpenMP][MLIR] Check for presence of Box type before emitting store in MapInfoFinalization pass (#135477)
Currently we don't check for the presence of descriptor/BoxTypes before
emitting stores which lower to memcpys, the issue with this is that
users can have optional arguments, where they don't provide an input,
making the argument effectively null. This can still be mapped and this
causes issues at the moment as we'll emit a memcpy for function
arguments to store to a local variable for certain edge cases, when we
perform this memcpy on a null input, we cause a segfault at runtime.

The fix to this is to simply create a branch around the store that
checks if the data we're copying from is actually present. If it is, we
proceed with the store, if it isn't we skip it.
2025-04-14 17:15:56 +02:00
Joel E. Denny
ad9f6d3cee
[PGO][Offload] Use %profdata in PGO tests (#135015)
So that the wrong llvm-profdata is not picked up from PATH.
2025-04-09 10:40:46 -04:00
Jan Leyonberg
fbc8335311
[MLIR][OpenMP] Add codegen for teams reductions (#133310)
This patch adds the lowering of teams reductions from the omp dialect to
LLVM-IR. Some minor cleanup was done in clang to remove an unused
parameter.
2025-04-07 12:47:16 -04:00
Ethan Luis McDonough
0c81105373
[PGO][Offload] Disable PGO on NVPTX (#133522) 2025-03-28 16:32:32 -05:00
macurtis-amd
21a8c63cdc
[offload] Remove bad assert in StaticLoopChunker::Distribute (#132705)
When building with asserts enabled, this can actually cause strange
miscompilations because an incorrect llvm.assume is generated at the
point of the assertion.
2025-03-28 04:53:00 -05:00
Ethan Luis McDonough
c50d39f073
[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268)
This pull request is the third part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on
https://github.com/llvm/llvm-project/pull/93365. This PR makes the
following changes:

- Allows PGO flags to be supplied to GPU targets
- Pulls version global from device
- Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to
allow the PGO version to be overridden
2025-03-19 19:01:38 -05:00
Krzysztof Parzyszek
f4fc2d731c
[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832)
Improve the check for whether a type can be passed by copy. Currently,
passing by copy is done via the OMP_MAP_LITERAL mapping, which can only
transfer as much data as can be contained in a pointer representation.
2025-03-12 19:41:11 -05:00
Krzysztof Parzyszek
d67947162f
[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568)
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.

When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.

Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).

---------

Co-authored-by: Sergio Afonso <safonsof@amd.com>
2025-03-10 08:11:01 -05:00
agozillon
f1178815d2
[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586)
This PR adds an initial implementation for the map modifiers close,
present and ompx_hold, primarily just required adding the appropriate
map type flags to the map type bits. In the case of ompx_hold it
required adding the map type to the OpenMP dialect. Close has a bit of a
problem when utilised with the ALWAYS map type on descriptors, so it is
likely we'll have to make sure close and always are not applied to the
descriptor simultaneously in the future when we apply always to the
descriptors to facilitate movement of descriptor information to device
for consistency, however, we may find an alternative to this with
further investigation. For the moment, it is a TODO/Note to keep track
of it.
2025-03-07 22:22:30 +01:00