164 Commits

Author SHA1 Message Date
Akash Banerjee
8afea0d0ea
[OpenMP][MLIR] Preserve to/from flags in mapper base entry for mappers (#159799)
With declare mapper, the parent base entry was emitted as `TARGET_PARAM`
only. The mapper received a map-type without `to/from`, causing
components to degrade to `alloc`-only (no copies), breaking allocatable
payload mapping. This PR preserves the map-type bits from the parent.

This fixes #156466.
2025-09-19 19:34:09 +01:00
Kareem Ergawy
c286a427b9
[NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping (#155993)
Adds end-to-end tests for `do concurrent` offloading to the device.


PR stack:
- https://github.com/llvm/llvm-project/pull/155754
- https://github.com/llvm/llvm-project/pull/155987
- https://github.com/llvm/llvm-project/pull/155992
- https://github.com/llvm/llvm-project/pull/155993 ◀️
- https://github.com/llvm/llvm-project/pull/157638
- https://github.com/llvm/llvm-project/pull/156610
- https://github.com/llvm/llvm-project/pull/156837
2025-09-17 07:04:13 +02:00
Jan Patrick Lehr
311d78f2a1
[OpenMP] Fix force-usm test after #157182 (#159095)
The refactoring lead to an additional data transfer. This changes the
assumed transfers in the check-strings to work with that changed
behavior.
2025-09-16 15:42:02 +02:00
Michał Górny
312b5615df
[offload] Fix finding libomptarget in runtimes build (#157856)
Per the logic in top-level CMakeLists, `libomptarget` is placed into
`LLVM_LIBRARY_OUTPUT_INTDIR` when this variable is set. Adjust the test
logic to include this directory in `-L` and `-Wl,-rpath` arguments as
well, in order to fix finding tests when building via the `runtimes`
top-level directory.

Signed-off-by: Michał Górny <mgorny@gentoo.org>
2025-09-10 16:31:22 +02:00
agozillon
8f16af3c20
[Flang][OpenMP] Fix mapping of character type with LEN > 1 specified (#154172)
Currently, there's a number of issues with mapping characters with LEN's
specified (strings effectively). They're represented as a char type in
FIR with a len parameter, and then later on they're expanded into an
array of characters when we're translating to the LLVM dialect. However,
we don't generate a bounds for these at lowering. The fix in this PR for
this is to generate a bounds from the LEN parameter and attatch it to
the map on lowering from FIR to the LLVM dialect when we encounter this
type.
2025-09-09 16:36:04 +02:00
Joseph Huber
6d032c4df2
[OpenMP] Fix incorrect CUDA bc path after library change (#157547) 2025-09-08 17:27:59 -05:00
Julian Brown
c71da7d5e0
[OpenMP] Add tests for mapping of chained 'containing' structs (#156703)
This PR adds several new tests for mapping of chained structures, i.e.
those resembling:

  #pragma omp target map(tofrom: a->b->c)

These are currently XFAILed, although the first two tests actually work
with unified memory -- I'm not sure if it's possible to easily improve
the condition on the XFAILs in question to make them more accurate.

These cases are all fixed by the WIP PR
https://github.com/llvm/llvm-project/pull/153683.
2025-09-08 10:30:04 +01:00
Michał Górny
7a88ddd3b1
Revert "[Offload] Run unit tests as a part of check-offload" (#157346)
Reverts llvm/llvm-project#156675 due to regressions in standalone build
and test errors without all plugins enabled (#157345).
2025-09-07 15:12:15 +00:00
Jan Patrick Lehr
209d91d9e4
[Offload] Fix CHECK string in llvm-omp-device-info test (#156872) 2025-09-04 14:30:37 +02:00
Joseph Huber
99f61f3436
[Offload] Run unit tests as a part of check-offload (#156675)
Summary:
Add a dependnecy on the unit tests on the main check-offload test suite.
This matches what the other projects do, pass `llvm-lit` to the
directory to only run the lit tests, use the `check-offload-unit` for
only the unit tests.
2025-09-03 10:26:44 -05:00
Jan Patrick Lehr
27e541645c
[Offload][OpenMP] Enable more tests on AMDGPU (#156626)
(Re)enables a couple of tests that were disabled on AMDGPU for some
reason. Pass for me locally.
2025-09-03 14:04:39 +02:00
Ross Brunton
70ddd838f0
[Offload] Update tablegen tests (#156041)
These were not updated after #154736 .
2025-08-29 16:20:49 +01:00
Jan Patrick Lehr
bcb9634be8
[Offload][OpenMP] Tests require libc on GPU for printf (#155785)
These tests currently fail when libc is not configured to be built as
they require printf to be available in target regions.
2025-08-28 14:30:18 +02:00
Abhinav Gaba
bb1cb6a198
[NFC][OpenMP] Add several use_device_ptr/addr tests. (#154939)
Most tests are either compfailing or runfailing.

They should start passing once we start using ATTACH map-type based
codegen. (#153683)

Even after they start passing, there are a few places where the EXPECTED
and actual CHECKs are different, due to two main issues:
* use_device_ptr translation on `&p[0]` is not succeeding in looking-up
a previously mapped `&p[1]`
* privatization of byref use_device_addr operands is not happening
correctly.

The above should be fixed as separate standalone changes.
2025-08-25 14:23:26 -07:00
Akash Banerjee
6aafe6582d Fix test added in 1fd1d634630754cc9b9c4b5526961d5856f64ff9 2025-08-18 13:29:23 +01:00
Akash Banerjee
1fd1d63463 [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-15 15:41:41 +01:00
Abhinav Gaba
2912c9c249
[NFC][Offload] Add missing maps to OpenMP offloading tests. (#153103)
A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an
`int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be
mapped, which is incorrect.

This change fixes six such tests.
2025-08-14 12:22:28 -07:00
Akash Banerjee
1c7720ef78 Revert "[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)"
This reverts commit 4e6d510eb3ec5b5e5ea234756ea1f0b283feee4a.
2025-08-12 20:19:45 +01:00
Akash Banerjee
4e6d510eb3
[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#153048)
Add a new AutomapToTargetData pass. This gathers the declare target
enter variables which have the AUTOMAP modifier. And adds
omp.declare_target_enter/exit mapping directives for fir.alloca and
fir.free oeprations on the AUTOMAP enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-12 15:18:15 +01:00
Amit Tiwari
2074e1320f
[Clang][OpenMP] Non-contiguous strided update (#144635)
This patch handles the strided update in the `#pragma omp target update
from(data[a🅱️c])` directive where 'c' represents the strided access
leading to non-contiguous update in the `data` array when the offloaded
execution returns the control back to host from device using the `from`
clause.

Issue: Clang CodeGen where info is generated for the particular
`MapType` (to, from, etc), it was failing to detect the strided access.
Because of this, the `MapType` bits were incorrect when passed to
runtime. This led to incorrect execution (contiguous) in the
libomptarget runtime code.

Added a minimal testcase that verifies the working of the patch.
2025-08-12 19:32:15 +05:30
Akash Banerjee
0998da27e9 Revert "[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#151989)"
This reverts commit 5a5e8ba0c388d57aecb359ed67919cda429fc7b1.
2025-08-11 13:52:39 +01:00
Akash Banerjee
5a5e8ba0c3
[MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (#151989)
Add a new `AutomapToTargetData` pass. This gathers the declare target
enter variables which have the `AUTOMAP` modifier. And adds
`omp.declare_target_enter/exit` mapping directives for `fir.allocmem`
and `fir.freemem` oeprations on the `AUTOMAP` enabled variables.

Automap Ref: OpenMP 6.0 section 7.9.7.
2025-08-11 13:18:38 +01:00
hidekisaito
83e5a99ff6
[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc (#151882)
Enables AMD data center class GPUs to use memory manager memory pooling
up to 3GB allocation by default, up from the "1 << 13" threshold that
all plugin-nextgen devices use.
2025-08-06 14:41:20 -07:00
Aiden Grossman
2e3fd547de [Offload] Fix typo in shared_lib_fp_mapping.c
Made a typo in 963259ef6be4871e5252ff3ac9df737af5d2b4cb because I cannot
run tests and also did not review it. This should fix it...
2025-07-25 23:17:46 +00:00
Aiden Grossman
963259ef6b
[Offload] Remove uses of %T from lit tests (#150721)
This patch removes all the instances of %T from offload/ (only one test
contained this construction). %T has been deprecated for ~7 years and is
not reccomended as it does not use a unique directory per test. Switch
to using %t to ensure we use a unique dir per test and so that we can
eventually remove %T.

I did not actually test this. A couple feeble attempts at
building/running the offload tests just leaves me with a ton of test
failures. Given how small this is I'm reasonably sure it works though.
2025-07-25 16:16:22 -07:00
agozillon
73272d6fc6
[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization (#150311)
Currently, we return early whenever we've already generated an
allocation for intermediate descriptor variables (required in certain
cases when we can't directly access the base address of a passes in
descriptor function argument due to HLFIR/FIR restrictions). This
unfortunately, skips over the presence check and load/store required to
set the intermediate descriptor allocations values/data. This is fine in
most cases, but if a function happens to have a series of branches with
seperate target regions capturing the same input argument, we'd emit the
present/load/store into the first branch with the first target inside of
it, the secondary (or any preceding) branches would not have the
present/load/store, this would lead to the subsequent mapped values in
that branch being empty and then leading to a memory access violation on
device.

The fix for the moment is to emit a present/load/store at the relevant
location of every target utilising the input argument, this likely will
also lead to fixing possible issues with the input argument being
manipulated inbetween target regions (primarily resizing, the data
should remain the same as we're just copying an address around, in
theory at least). There's possible optimizations/simplifications to emit
less load/stores such as by raising the load/store out of the branches
when we can, but I'm inclined to leave this sort of optimization to
lower level passes such as an LLVM pass (which very possibly already
covers it).
2025-07-25 16:15:54 +02:00
hidekisaito
75e60e745b
[AMDGPU][Offload][LIT] Run unified_shared_memory tests on gfx950 (#150372)
Enables 9 more tests
2025-07-23 22:46:26 -07:00
Ye Luo
9f6784cc1f [libomptarget] fix test offloading/disable_default_device.c
Fixes the incorrect lit command line introduced in 536ba87726d8dea862d964678dbb761ca32e21fb
2025-07-09 09:52:00 -05:00
Abhinav Gaba
ae4a81e849
[NFC][OpenMP] Add tests for mapping pointers and their dereferences. (#146934)
The output of the compile-and-run tests is incorrect. These will be used
for reference in future commits that resolve the issues.

Also updated the existing clang LIT test,
target_map_both_pointer_pointee_codegen.cpp, with more constructs and
fewer CHECKs (through more update_cc_test_checks filters).
2025-07-08 06:52:38 -04:00
Ye Luo
536ba87726
[libomptarget] Add a test for OMP_TARGET_OFFLOAD=disabled (#146385)
closes https://github.com/llvm/llvm-project/issues/144786
2025-06-30 13:29:36 -05:00
Julian Brown
b62b58d1bb
[OpenMP] Fix crash with duplicate mapping on target directive (#146136)
OpenMP allows duplicate mappings, i.e. in OpenMP 6.0, 7.9.6 "map
Clause":

  Two list items of the map clauses on the same construct must not share
  original storage unless one of the following is true: they are the same
  list item [or other omitted reasons]"

Duplicate mappings can arise as a result of user-defined mapper
processing (which I think is a separate bug, and is not addressed here),
but also in straightforward cases such as:

  #pragma omp target map(tofrom: s.mem[0:10]) map(tofrom: s.mem[0:10])

Both these cases cause crashes at runtime at present, due to an
unfortunate interaction between reference counting behaviour and shadow
pointer handling for blocks. This is what happens:

  1.  The member "s.mem" is copied to the target
  2.  A shadow pointer is created, modifying the pointer on the target
  3.  The member "s.mem" is copied to the target again
  4. The previous shadow pointer metadata is still present, so the runtime doesn't modify the target pointer a second time.

The fix is to disable step 3 if we've already done step 2 for a given
block that has the "is new" flag set.
2025-06-29 22:41:24 +01:00
Ross Brunton
02d2a1646a
[Offload] Fix entry_points.td test (#145292)
This was broken as part of #144494 , and just needs an update to the
check lines.
2025-06-23 11:09:08 +01:00
Ethan Luis McDonough
daee5eee85
[Offload][PGO] Fix new GPU PGO tests (#143645)
`pgo_atomic_teams.c` and `pgo_atomic_threads.c` currently are set to run
on NVPTX despite the changes for that target not being upstreamed yet.
This patch also replaces instances of `llvm-profdata` with `%profdata`
in those tests.
2025-06-12 11:14:21 -05:00
Abhinav Gaba
02b6849cf1
[Clang][OpenMP] Fix mapping of arrays of structs with members with mappers (#142511)
This builds upon #101101 from @jyu2-git, which used compiler-generated
mappers when mapping an array-section of structs with members that have
user-defined default mappers.

Now we do the same when mapping arrays of structs.
2025-06-11 19:03:55 +00:00
Joseph Huber
051945304b
[Offload] Fix APU detection for MI300 testing (#143026)
Summary:
We have this check when the target is MI300 but it fails if this
environment variable isn't set. Set a default value of '0' if not
present so that will be converted to bool false.
2025-06-05 15:31:55 -05:00
Jan Patrick Lehr
e97f42e931
[OpenMP][Offload] Fix typo in error message (#142589)
It appears that the spelling was incorrect in those test cases. At least
on machines with ROCm version > 6.3.

I had no chance to test with ROCm version version < 6.2 and would be
interested in the result if someone has the chance.
2025-06-03 07:33:45 -05:00
Joseph Huber
b26baf1779
[Offload] Make AMDGPU plugin handle empty allocation properly (#142383)
Summary:
`malloc(0)` and `free(nullptr)` are both defined by the standard but we
current trigger erros and assertions on them. Fix that so this works
with empty arguments.
2025-06-02 08:12:20 -05:00
Ross Brunton
a1191b4875
[Offload] Fix broken tablegen test after #140879 (#141796) 2025-05-28 11:30:15 -05:00
Johannes Doerfert
57a90edacd
[OpenMP][GPU][FIX] Enable generic barriers in single threaded contexts (#140786)
The generic GPU barrier implementation checked if it was the main thread
in generic mode to identify single threaded regions. This doesn't work
since inside of a non-active (=sequential) parallel, that thread becomes
the main thread of a team, and is not the main thread in generic mode.
At least that is the implementation of the APIs today.

To identify single threaded regions we now check the team size
explicitly.

This exposed three other issues; one is, for now, expected and not a
bug, the second one is a bug and has a FIXME in the
single_threaded_for_barrier_hang_1.c file, and the final one is also
benign as described in the end.

The non-bug issue comes up if we ever initialize a thread state.
Afterwards we will never run any region in parallel. This is a little
conservative, but I guess thread states are really bad for performance
anyway.

The bug comes up if we optimize single_threaded_for_barrier_hang_1 and
execute it in Generic-SPMD mode. For some reason we loose all the
updates to b. This looks very much like a compiler bug, but could also
be another logic issue in the runtime. Needs to be investigated.

Issue number 3 comes up if we have nested parallels inside of a target
region. The clang SPMD-check logic gets confused, determines SPMD (which
is fine) but picks an unreasonable thread count. This is all benign, I
think, just weird:

```
  #pragma omp target teams
  #pragma omp parallel num_threads(64)
  #pragma omp parallel num_threads(10)
  {}
```
Was launched with 10 threads, not 64.
2025-05-20 19:33:54 -07:00
Ethan Luis McDonough
1043810769
[PGO][Offload] Update PGO GPU tests (#132262) 2025-05-14 17:17:52 -05:00
agozillon
f687ed9ff7
[Flang][OpenMP] Initial defaultmap implementation (#135226)
This aims to implement most of the initial arguments for defaultmap
aside from firstprivate and none, and some of the more recent OpenMP 6
additions which will come in subsequent updates (with the OpenMP 6
variants needing parsing/semantic support first).
2025-05-12 16:30:43 +02:00
agozillon
b291cfcad4
[Flang][OpenMP] Generate correct present checks for implicit maps of optional allocatables (#138210)
Currently, we do not generate the appropriate checks to check if an
optional
allocatable argument is present before accessing relevant components of
it,
in particular when creating bounds, we must generate a presence check
and we
must make sure we do not generate/keep an load external to the presence
check
by utilising the raw address rather than the regular address of the info
data structure.

Similarly in cases for optional allocatables we must treat them like
non-allocatable
arguments and generate an intermediate allocation that we can have as a
location
in memory that we can access later in the lowering without causing
segfaults when
we perform "mapping" on it, even if the end result is an empty
allocatable
(basically, we shouldn't explode if someone tries to map a non-present
optional,
similar to C++ when mapping null data).
2025-05-09 13:57:45 +02:00
Callum Fare
6022a5214b
[Offload] Add check-offload-unit for liboffload unittests (#137312)
Adds a `check-offload-unit` target for running the liboffload unit test
suite. This unit test binary runs the tests for every available device.
This can optionally filtered to devices from a single platform, but the
check target runs on everything.

The target is not part of `check-offload` and does not get propagated to
the top level build. I'm not sure if either of these things are
desirable, but I'm happy to look into it if we want.

Also remove the `offload/unittests/Plugins` test as it's dead code and
doesn't build.
2025-04-29 11:21:59 -05:00
Joseph Huber
92bba68634
[Offload] Fix handling of 'bare' mode when environment missing (#136794)
Summary:
We treated the missing kernel environment as a unique mode, but it was
kind of this random bool that was doing the same thing and it explicitly
expects the kernel environment to be zero. It broke after the previous
change since it used to default to SPMD and didn't handle zero in any of
the other cases despite being used. This fixes that and queries for it
without needing to consume an error.
2025-04-23 08:16:39 -05:00
Callum Fare
800d949bb3
[Offload] Implement the remaining initial Offload API (#122106)
Implement the complete initial version of the Offload API, to the extent
that is usable for simple offloading programs. Tested with a basic SYCL
program.

As far as possible, these are simple wrappers over existing
functionality in the plugins.

* Allocating and freeing memory (host, device, shared).
* Creating a program 
* Creating a queue (wrapper over asynchronous stream resource)
* Enqueuing memcpy operations
* Enqueuing kernel executions
* Waiting on (optional) output events from the enqueue operations
* Waiting on a queue to finish

Objects created with the API have reference counting semantics to handle
their lifetime. They are created with an initial reference count of 1,
which can be incremented and decremented with retain and release
functions. They are freed when their reference count reaches 0. Platform
and device objects are not reference counted, as they are expected to
persist as long as the library is in use, and it's not meaningful for
users to create or destroy them.

Tests have been added to `offload.unittests`, including device code for
testing program and kernel related functionality.

The API should still be considered unstable and it's very likely we will
need to change the existing entry points.
2025-04-22 13:27:50 -05:00
Joseph Huber
5eabececb0 [Offload] Fix JIT test 2025-04-18 12:01:04 -05:00
Joseph Huber
6c5f50f186 [Offload] Fix typo on -Xoffload-linker 2025-04-18 10:47:45 -05:00
Joseph Huber
db0f754c5a
[OpenMP] Remove 'libomptarget.devicertl.a' fatbinary and use static library (#126143)
Summary:
Currently, we build a single `libomptarget.devicertl.a` which is a
fatbinary. It is a host object file that contains the embedded archive
files for both the NVIDIA and AMDGPU targets. This was done primarily as
a convenience due to naming conflicts. Now that the clang driver for the
GPU targets can appropriate link via the per-target runtime-dir, we can
just make two separate static libraries and remove the indirection.

This patch creates two new static libraries that get installed into
```
/lib/amdgcn-amd-amdhsa/libomp.a
/lib/nvptx64-nvidia-cuda/libomp.a
```
for AMDGPU and NVPTX respectively. The link job created by the linker
wrapper now simply needs to do `-lomp` and it will search those
directories and link those static libraries. This requires far less
special handling.

This patch is a precursor to changing the build system entirely to be a
runtimes based one. Soon this target will be a standard `add_library`
and done through the GPU runtime targets.

NOTE that this actually does remove an additional optimization step.
Previously we merged all of the files into a single bitcode object and
forcibly internalized some definitions. This, instead, just treats them
like a normal static library. This may possibly affect performance for
some files, but I think it's better overall to use static library
semantics because it allows us to have an 'include-what-you-use'
relationship with the library.

Performance testing will be required. If we really need the merged blob
then we can simply pack that into a new static library.
2025-04-18 07:43:31 -05:00
agozillon
b2c9a58b8f
[Flang][OpenMP][MLIR] Check for presence of Box type before emitting store in MapInfoFinalization pass (#135477)
Currently we don't check for the presence of descriptor/BoxTypes before
emitting stores which lower to memcpys, the issue with this is that
users can have optional arguments, where they don't provide an input,
making the argument effectively null. This can still be mapped and this
causes issues at the moment as we'll emit a memcpy for function
arguments to store to a local variable for certain edge cases, when we
perform this memcpy on a null input, we cause a segfault at runtime.

The fix to this is to simply create a branch around the store that
checks if the data we're copying from is actually present. If it is, we
proceed with the store, if it isn't we skip it.
2025-04-14 17:15:56 +02:00
Joseph Huber
2f41fa387d
[AMDGPU] Fix code object version not being set to 'none' (#135036)
Summary:
Previously, we removed the special handling for the code object version
global. I erroneously thought that this meant we cold get rid of this
weird `-Xclang` option. However, this also emits an LLVM IR module flag,
which will then cause linking issues.
2025-04-10 11:31:21 -05:00