142 Commits

Author SHA1 Message Date
fineg74
2890f9883c
[OFFLOAD] Improve handling of synchronization errors in L0 plugin and reenable tests (#186927)
This change improves handling of errors during synchronization in Level
Zero plugin by ensuring cleanup of queues and events in case of an
synchronization error. As a result multiple tests stopped hanging.

---------

Co-authored-by: Duran, Alex <alejandro.duran@intel.com>
2026-03-18 05:50:06 +01:00
Nick Sarnie
1beec14434
[offload][lit] XFAIL new tests failing on intelgpu (#185908)
New tests from https://github.com/llvm/llvm-project/pull/176708 and
https://github.com/llvm/llvm-project/pull/181987 fail on `intelgpu`, I
updated the [GH
issue](https://github.com/llvm/llvm-project/issues/182897).

Example fails
[here](https://lab.llvm.org/buildbot/#/builders/225/builds/3441).

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2026-03-12 14:37:56 +00:00
Kevin Sala Penades
1f583c6dee
[OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (#152831)
Part 3 adding offload runtime support. See
https://github.com/llvm/llvm-project/pull/152651.

---------

Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
2026-03-12 01:13:06 -07:00
Amit Tiwari
a15dcd4117
[Clang][OpenMP] Handled NonContig Descriptor DimCount (#181987)
### Issue: Dimension override missing
When variable count expressions were used with stride, the constant
subsection path computed size first. This marked `ArgSizes` with byte
size semantics. Variable expression logic later triggered, but reused
`ArgSizes` assuming "bytes" semantics

`OMPIRBuilder.cpp` didn't handle dimension count for
`OMP_MAP_NON_CONTIG` flag

**Result**: `ArgSizes` wasn't overwritten with dimension count, breaking
non-contiguous mapping.

**Fixes**:

`llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp` - Expression semantics for
non-contiguous.
 stride/count.
Generate 3D descriptor structures with runtime dimensions.
Fix dimension override to use dimension count instead of byte size.

Added testcases to cover stack arrays, heap pointers, struct members,
etc.
2026-03-11 19:39:55 +05:30
Amit Tiwari
14de1bb711
[Clang][OpenMP] Support expression semantics in target update fields with non-contiguous array sections (#176708)
### Issue: Variable stride not recognized as non-contiguous
`CGOpenMPRuntime.cpp` failed to detect `DeclRefExpr`, `MemberExpr`,
`ArraySubscriptExpr` as non-contiguous.

**Fixes**:

`clang/lib/CodeGen/CGOpenMPRuntime.cpp` - Variable stride detection +
dimension count logic
Detect variable stride expressions
(`DeclRefExpr/MemberExpr/ArraySubscriptExpr`) as non-contiguous

Added testcases to cover stack arrays, heap pointers, struct members,
etc., for expression semantics in non-contiguous update.
2026-03-10 19:26:17 +05:30
Kareem Ergawy
0bf9bb5c42
[Flang][OpenMP] Fix close map flag propagation for derived types in USM (#185330)
This fixes a bug in USM mode where the `close` map type modifer was
attached to some `map.info.op`'s corresponding to user-defined type
members while the parent type instance itself is not marked as `close`.

This fix ensures that if a parent record type map does not have the
'close' flag, it is cleared from its members as well, maintaining
consistency.

Gemini was used to create tests. AI generated test code was reviewed
line-by-line by me. Which were derived from a reproducer I was working
with to debug the issue.

Assisted-by: Gemini <gemini@google.com>
2026-03-09 15:55:53 +01:00
Krish Gupta
148b10be8a
[flang][OpenMP] Support custom mappers in target update to/from clauses (#169673)
Implement support for the OpenMP `mapper` modifier on `target update` `to` and
`from` clauses in Flang.

Semantic name resolution is extended to bind the mapper symbol for
`OmpClause::To` and `OmpClause::From` via a shared `ResolveMapperModifier`
helper. Lowering is extended in `ClauseProcessor` with a `getMapperIdentifier`
template helper to extract the mapper name for both `map` and `target update`
clauses and forward it to `omp.map_info`.

Fixes #168701.

Reviewed By: TIFitis, kparzysz
Assited By: Copilot( For review and articulations of messages)
2026-03-02 21:59:56 +05:30
Joseph Huber
c49460bae7
[flang-rt] Enable more runtime functions for the GPU target (#183649)
Summary:
This enables primarily `stop.cpp` and `descriptor.cpp`. Requires a
little bit of wrangling to get it to compile. Unlike the CUDA build,
this build uses an in-tree libc++ configured for the GPU. This is
configured without thread support, environment, or filesystem, and it is
not POSIX at all. So, no mutexes, pthreads, or get/setenv.

I tested stop, but i don't know if it's actually legal to exit from
OpenMP offloading.
2026-02-27 12:27:39 -06:00
Nick Sarnie
600919ac32
[Offload][clang-linker-wrapper][SPIRV] Tell spirv-link to not optimize out exported symbols (#182930)
`spirv-link` seems to internalize all symbols, which ends up causing the
OpenMP Device Environment global generated by the OMP FE to get
optimized out which causes `liboffload` to run in the wrong
parallelization mode which breaks at least one liboffload lit test.

Pass `--create-library` to tell it not to do that.

```
  --create-library
               Link the binaries into a library, keeping all exported symbols.
```

This fixes the test.

Closes: https://github.com/llvm/llvm-project/issues/182901

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2026-02-24 15:10:59 +00:00
Joseph Huber
70b5a1d050
[flang-rt] Add support for formatted I/O on the GPU (#182580)
Summary:
Expands on the previous support to enable formatted output, characters,
and checking basic iostat. We intentionally do not handle cases where
the descriptor is non-null as this is a non-trivial class that cannot
easily be shepherded across the wire.
2026-02-20 14:43:06 -06:00
Joseph Huber
21b3461440
[flang-rt] Implement basic support for I/O from OpenMP GPU Offloading (#181039)
Summary:
This PR provides the minimal support for Fortran I/O coming from a GPU
in OpenMP offloading. We use the same support the `libc` uses for its
printing through the RPC server. The helper functions `rpc::dispatch`
and `rpc::invoke` help make this mostly automatic.

Becaus Fortran I/O is not reentrant, the vast majority of complexity
comes from needing to stitch together calls from the GPU until they can
be executed all at once. This is needed not only because of the
limitations of recursive I/O, but without this the output would all be
interleaved because of the GPU's lock-step execution.

As such, the return values from the intermediate functions are
meaningless, all returning true. The final value is correct however. For
cookies we create a context pointer on the server to chain these
together.

Works on both my AMD and NVIDIA GPUs.
```fortran
program hello_gpu
  implicit none

  !$omp target teams num_teams(1)
  !$omp parallel num_threads(2)
    ! Print strings
    print *, "Hello from GPU"
  !$omp end parallel
  !$omp end target teams

end program hello_gpu
```
```console
> flang hello.f90 -O2 -fopenmp --offload-arch=gfx1030 
> ./a.out 
 Hello from GPU
 Hello from GPU
> flang hello.f90 -O2 -fopenmp --offload-arch=sm_89  
> ./a.out 
 Hello from GPU
 Hello from GPU
```
2026-02-20 07:56:59 -06:00
Nick Sarnie
78ff5b55fd
[offload][lit] Enable/disable tests on Level Zero when using DeviceRTL (#182128)
Since we can now build the DeviceRTL with SPIR-V, redo the
`XFAIL/UNSUPPORTED` specifications for the tests we see passing/failing
on the Level Zero backend with the DeviceRTL being used.

The tests marked `UNSUPPORTED` hang or sporadically fail and those are
tracked in https://github.com/llvm/llvm-project/issues/182119.

This change will allow us to enable CI testing with the DeviceRTL.

Here are the full test results with this change applied, running only
the `spirv64-intel` `check-offload` tests:

```
Total Discovered Tests: 453
  Unsupported      : 206 (45.47%)
  Passed           : 141 (31.13%)
  Expectedly Failed: 106 (23.40%)
```

31% is not a bad start.

---------

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2026-02-18 21:53:19 +00:00
Robert Imschweiler
33e3384aa8
[offload] Adapt tests to new PluginInterface quoting [NFC] (#180505)
4096cb6017
removed the quotes around PluginInterface
2026-02-09 07:47:04 -06:00
Akash Banerjee
7c07cb6542
[MLIR][OpenMP] Fix recursive mapper emission. (#178453)
Recursive types can cause re-entrant mapper emission. The mapper
function is created by OpenMPIRBuilder before the callbacks run, so it
may already exist in the LLVM module even though it is not yet
registered in the ModuleTranslation mapping table. Reuse and register it
to break the recursion. Added offloading test.
2026-01-29 16:38:33 +00:00
Akash Banerjee
c856c3d045
[MLIR][OpenMP] Fix mapper being attached to partial maps. (#178247)
Fix OpenMP mapper lowering by attaching user-defined/default mappers
only to the base parent entry, not combined/segment entries. This
prevents mapper calls with partial sizes. Added relevant tests.
2026-01-28 18:35:03 +00:00
Akash Banerjee
21b0fdf8ec
[Flan][OpenMP] Implement TODO support for compatible defaultmap types for implicit mappers (#177389)
Make implicit default mapper generation respect defaultmap categories so
unrelated defaultmap clauses no longer suppress mappers for derived
types.
Added related tests.
2026-01-27 14:45:40 +00:00
Abid Qadeer
7a74e7fba3
[flang][OpenMP] Fix mapping of constant arrays. (#176763)
The compiler skips mapping of named constants (parameters) to OpenMP
target regions under the assumption that constants don't need to be
mapped. This assumption is not valid when array is accessed inside with
dynamic index. The problem can be seen with the following code:

```
module fir_lowering_check
  implicit none

    integer, parameter :: dp = selected_real_kind(15, 307)
    real(dp), parameter :: arrays(2) = (/ 0.0, 0.0 /)

contains

subroutine test(hold)

        integer, intent(in) :: hold
        integer :: z
        real(dp) :: temp

        !$omp target teams distribute parallel do
            do z = 1, 2
                  temp = arrays(hold)
            end do
        !$omp end target teams distribute parallel do

    end subroutine test
end module fir_lowering_check

program main
  use fir_lowering_check

  implicit none
    integer :: hold
    hold = 1
    call test(hold)
    print *, "Finished"

end program main
```

It fails with the following error
`'hlfir.designate' op using value defined outside the region`

The fix is to allow mapping of constant arrays and map them as `to`.
2026-01-21 13:40:01 +00:00
Nick Sarnie
75cc3cd1e8
[offload][lit] XFAIL failing non-contiguous update tests on Intel (#176955)
The new tests added in https://github.com/llvm/llvm-project/pull/169623
are [failing](https://lab.llvm.org/buildbot/#/builders/225/builds/544)
on the Intel GPU runner, so XFAIL them as with the other tests.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2026-01-20 16:39:25 +00:00
Amit Tiwari
ae1ee0d377
[Offload][Tests] Non-contiguous_update_to_tests (#169623)
PR #144635 enabled non-contiguous updates for both `update from` and
`update to` clauses, but tests for `update to` were missing. This PR
adds those missing tests to ensure coverage.
2026-01-20 12:03:39 +05:30
Jan Patrick Lehr
4a8a0593bd
[Offload] Fix failing Fortran test w/ line number (#175247)
This test also depends on the line number. Following similar approach as
other with [[@LINE]] macro.
2026-01-09 21:12:28 +00:00
Joseph Huber
c722ef4874
[OpenMP] Remove testing LTO variant on CPU targets (#175187)
Summary:
This is only really meaningful for the NVPTX target. Not all build
environments support host LTO and these are redundant tests, just clean
this up and make it run faster.
2026-01-09 10:13:44 -06:00
Kareem Ergawy
e82399dac2
[flang][OpenMP] Prevent omp.map.info ops with user-defined mappers from being marked as parial maps (#175133)
The following test was triggering a runtime crash **on the host before
launching the kernel**:
```fortran
program test_omp_target_map_bug_v5
  implicit none
  type nested_type
    real, allocatable :: alloc_field(:)
  end type nested_type

  type nesting_type
    integer :: int_field
    type(nested_type) :: derived_field
  end type nesting_type

  type(nesting_type) :: config

  allocate(config%derived_field%alloc_field(1))

  !$OMP TARGET ENTER DATA MAP(TO:config, config%derived_field%alloc_field)

  !$OMP TARGET
  config%derived_field%alloc_field(1) = 1.0
  !$OMP END TARGET

  deallocate(config%derived_field%alloc_field)
end program test_omp_target_map_bug_v5
```

In particular, the runtime was producing a segmentation fault when the
test is compiled with any optimization level > 0; if you compile with
-O0 the sample ran fine.

After debugging the runtime, it turned out the crash was happening at
the point where the runtime calls the default mapper emitted by the
compiler for `nesting_type; in particular at this point in the runtime:
c62cd2877c/offload/libomptarget/omptarget.cpp (L307).

Bisecting the optimization pipeline using `-mllvm -opt-bisect-limit=N`,
the first pass that triggered the issue on `O1` was the `instcombine`
pass. Debugging this further, the issue narrows down to canonicalizing
`getelementptr` instructions from using struct types (in this case the
`nesting_type` in the sample above) to using addressing bytes (`i8`). In
particular, in `O0`, you would see something like this:
```llvm
define internal void @.omp_mapper._QQFnesting_type_omp_default_mapper(ptr noundef %0, ptr noundef %1, ptr noundef %2, i64 noundef %3, i64 noundef %4, ptr noundef %5) #6 {
entry:
  %6 = udiv exact i64 %3, 56
  %7 = getelementptr %_QFTnesting_type, ptr %2, i64 %6
  ....
}
```

```llvm
define internal void @.omp_mapper._QQFnesting_type_omp_default_mapper(ptr noundef %0, ptr noundef %1, ptr noundef %2, i64 noundef %3, i64 noundef %4, ptr noundef %5) #6 {
entry:
  %6 = getelementptr i8, ptr %2, i64 %3
  ....
}
```

The `udiv exact` instruction emitted by the OMP IR Builder (see:
c62cd2877c/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (L9154))
allows `instcombine` to assume that `%3` is divisible by the struct size
(here `56`) and, therefore, replaces the result of the division with
direct GEP on `i8` rather than the struct type.

However, the runtime was calling
`@.omp_mapper._QQFnesting_type_omp_default_mapper` not with `56` (the
proper struct size) but with `48`!

Debugging this further, I found that the size of `omp.map.info`
operation to which the default mapper is attached computes the value of
`48` because we set the map to partial (see:
c62cd2877c/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp (L1146)
and
c62cd2877c/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (L4501-L4512)).

However, I think this is incorrect since the emitted mapper (and
user-defined mappers in general) are defined on the whole struct type
and should never be marked as partial. Hence, the fix in this PR.
2026-01-09 15:15:10 +01:00
Jan Patrick Lehr
13fb3f3b19
[Offload] Fix line numbers after #174804 (#174932)
The changes in line numbers caused a few CHECK macros to now fail. This
is fixed by this PR.

Build w/ breakages:
https://lab.llvm.org/staging/#/builders/105/builds/39748
2026-01-08 12:08:45 +01:00
Nick Sarnie
26b777444b
[offload][lit] XFAIL all failing tests on the Level Zero plugin (#174804)
We finally got our buildbot added (to staging, at least) so we want to
start running L0 tests in CI.
We need `check-offload` to pass though, so XFAIL everything failing.
There's a couple `UNSUPPORTED` as well, those are for sporadic fails.

Also make set the `gpu` and `intelgpu` LIT variables when testing the
`spirv64-intel` triple.

We have no DeviceRTL yet so basically everything fails, but we manage to
get

```
Total Discovered Tests: 432
Unsupported      : 169 (39.12%)
Passed           :  67 (15.51%)
Expectedly Failed: 196 (45.37%)
```

We still don't build the level zero plugin by default and these tests
don't run unless the plugin was built, so this has no effect on most
builds.

---------

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2026-01-07 19:20:30 +00:00
Nick Sarnie
08a43f854c
[offload][lit] Use '%not' instead of 'not' in requires.c (#174506)
Typo exposed by recent `not` behavior change, we need to make sure we're
using the LLVM one.

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2026-01-05 23:31:15 +00:00
Nick Sarnie
13e3527412
[offload][lit] Fix requires.c after 'not' behavior change (#174499)
`not` behavior change in
https://github.com/llvm/llvm-project/pull/174298 requires `--crash`
passed now.

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2026-01-05 23:14:09 +00:00
Michał Górny
05349a9b43
[offload] [test] Mark bug 51781 test as requiring GPU (#174284)
While the main problem with the test is that it requires LLD, given that
it is unlikely to be testing anything meaningful for a CPU-only build,
just mark it as requiring GPU.

Fixes #100780

Signed-off-by: Michał Górny <mgorny@gentoo.org>
2026-01-03 19:25:51 +01:00
Amit Tiwari
230b437d05
[Clang][OpenMP] Handle check for non-contiguous mapping in pointer-based array sections (#157443)
### 1. ElementType deduction for pointer-based array sections

Problem: Pointer-based array sections were previously ignored during
`ElementType` deduction, leading to incorrect assumptions about array
item types.

This often resulted in out-of-bounds access, as seen in the assertion
failure:
```
Assertion `idx < size()' failed.
llvm-project/llvm/include/llvm/ADT/SmallVector.h:292:
reference llvm::SmallVectorTemplateCommon<llvm::Value *>::operatorsize_type
[T = llvm::Value *]

```
Fix: Added a check in clang/lib/CodeGen/CGOpenMPRuntime.cpp to ensure
`ElementType` is correctly detected for cases involving non-contiguous
updates with a base pointer.
Impact: Resolves failures in OpenMP_VV (formerly sollve_vv) and other
offload/clang-OpenMP tests:

All tests under:

https://github.com/OpenMP-Validation-and-Verification/OpenMP_VV/tree/master/tests/5.0/target_update

test_target_update_mapper_from_discontiguous.c
test_target_update_mapper_to_discontiguous.c
test_target_update_to_discontiguous.c
test_target_update_from_discontiguous.c



### 2. Zero-dimension propagation in struct member mappings

Problem: A zero-dimension entry for struct members introduced
inconsistencies in complex mapping logic within OMPIRBuilder.cpp.

Placeholder zeros propagated to emitNonContiguousDescriptor(), breaking
reverse indexing logic and corrupting IR:

Loops assume `Dims[I] >= 1`. When `Dims[I] == 0`:

Reverse indexing still stores pointers to uninitialized allocas or
mismatched slots. Runtime interprets `ArgSizes[I]` (derived from
`Dims[I])` as dimensionality, causing size/offset calculations to
collapse to zero → results in `size=0` async copy and plugin interface
errors.

Fix: Prepend a synthetic dimension of size 1 instead of appending a
zero, preserving correctness in `targetDataUpdate()` for non-contiguous
updates.
Impact: Added dedicated test cases that previously failed on main.
2025-12-23 12:57:12 +05:30
Hansang Bae
c3a5ec0360
[NFC][Offload] Missing test change in #153683 (#172587) 2025-12-17 09:03:54 -06:00
Abhinav Gaba
1fbf33cd40
[OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers. (#153683)
This adds support for using `ATTACH` map-type for proper
pointer-attachment when mapping list-items that have base-pointers.

For example, for the following:

```c
  int *p;
  #pragma omp target enter data map(p[1:10])
```

The following maps are now emitted by clang:
```
  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH
```

Previously, the two possible maps emitted by clang were:
```
  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ
````

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.

-----

With this change, we are using ATTACH-style maps, like `(A)`, for cases
where the expression has a base-pointer. For example:


```cpp
  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

```

#### Grouping of maps based on attach base-pointers
We also group mapping of clauses with the same base decl in the order of
the increasing complexity of their base-pointers, e.g. for something
like:
```
  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                 // attach-ptr: spp
  map(spp),                    // attach-ptr: N/A
```

We first map `spp`, then `spp[0]` then `spp[0][0]` and `spp[0][0].a`.

This allows us to also group "struct" allocation based on their attach
pointers. This resolves the issues of us always mapping everything from
the beginning of the symbol `spp`. Each group is mapped independently,
and at the same level, like `spp[0][0]` and its member `spp[0][0].a`, we
still get map them together as part of the same contiguous struct
`spp[0][0]`. This resolves issue #141042.

#### use_device_ptr/addr fixes
The handling of `use_device_ptr/addr` was updated to use the attach-ptr
information, and works for many cases that were failing before. It has
to be done as part of this series because otherwise, the switch from
ptr_to_obj to attach-style mapping would have caused regressions in
existing use_device_ptr/addr tests.

#### Handling of attach-pointers that are members of implicitly mapped
structs:
* When a struct member-pointer, like `p` below, is a base-pointer in a
`map` clause on a target construct (like `map(p[0:1])`, and the base of
that struct is either the `this` pointer (implicitly or explicitly), or
a struct that is implicitly mapped on that construct, we add an implicit
`map(p)` so that we don't implicitly map the full struct.
 ```c
  struct S { int *p;
  void f1() {
    #pragma omp target map(p[0:1]) // Implicitly map this->p, to ensure
// that the implicit map of `this[:]` does
                                   // not map the full struct
       printf("%p %p\n", &p, p);
  }
 ```

#### Scope for improvement:
* We may be able to compute attach-ptr expr while collecting
component-lists in Sema.
* But we cache the computation results already, and `findAttachPtrExpr`
is fairly simple, and fast.
* There may be a better way to implement semantic expr comparison.

#### Needs future work:
* Attach-style maps not yet emitted for declare mappers.
* Mapping of class member references: We are still using PTR_AND_OBJ
maps for them. We will likely need to change that to handle
`ref_ptr/ref_ptee`, and `attach` map-type-modifier on them.
* Implicit capturing of "this" needs to map the full `this[0:1]` unless
there is an explicit map on one of the members, or a map with a member
as its base-pointer.
* Implicit map added for capturing a class member pointer needs to also
add a zero-length-array-section map.
* `use_device_addr` on array-sections-on-pointers need further
improvements (documented using FIXMEs)

#### Why a large PR
While it's unfortunate that this PR has gotten large and difficult to
review, the issue is that all the functional changes have to be made
together, to prevent regressions from partially implemented changes.

For example, the changes to capturing were previously done separately
(#145454), but they would still cause stability issues in absence of
full attach-mapping. And attach-mapping needs those changes to be able
to launch kernels.

We extracted the utilities and functions, like those for finding
attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't
call those functions, just adds them (#155625). Maybe the change that
adds a new error message for use_device_addr on array-sections with
non-var base-pointers could have been extracted out too (but that would
have had to be a follow-up change in that case, and we would get
comp-fails with this PR when the erroneous case was not
caught/diagnosed).

---------

Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-12-15 16:40:31 -08:00
Abhinav Gaba
f44740afff
[NFC][Offload] Fix minor debug print issues introduced in #170425. (#172377) 2025-12-15 15:10:24 -08:00
Kevin Sala Penades
35315a84b4
[offload] Fix CUDA args size by subtracting tail padding (#172249)
This commit makes the cuLaunchKernel call to pass the total arguments size without tail padding.
2025-12-14 21:57:25 -08:00
Akash Banerjee
b360a782ca
Reland "[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331)" (#170851)
Add support for OpenMP is_device_ptr clause for target directives.

[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr
#169367 This PR adds support for the OpenMP is_device_ptr clause in the
MLIR to LLVM IR translation for target regions. The is_device_ptr clause
allows device pointers (allocated via OpenMP runtime APIs) to be used
directly in target regions without implicit mapping.
2025-12-05 17:38:41 +00:00
theRonShark
be79a0d90f
Revert "[Flang][OpenMP] Add lowering support for is_device_ptr clause" (#170778)
Reverts llvm/llvm-project#169331
2025-12-04 19:38:16 -05:00
Akash Banerjee
a77c4948a5
[Flang][OpenMP] Add lowering support for is_device_ptr clause (#169331)
Add support for OpenMP is_device_ptr clause for target directives.

[MLIR][OpenMP] Add OpenMPToLLVMIRTranslation support for is_device_ptr #169367
This PR adds support for the OpenMP is_device_ptr clause in the MLIR to LLVM IR translation for target regions. The is_device_ptr clause allows device pointers (allocated via OpenMP runtime APIs) to be used directly in target regions without implicit mapping.
2025-12-04 15:57:24 +00:00
Nick Sarnie
b3b83ac1e8
[offload][lit] Fix compilation of two offload tests (#169399)
These are C tests, not C++, so no function parameters means unspecified
number of parameters, not `void`.

These compile fine on the current tested offload targets because an
error is only
[thrown](https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDecl.cpp#L10695)
if the calling convention doesn't support variadic arguments, which they
happen to.

When compiling this test for other targets that do not support variadic
arguments, we get an error, which does not seem intentional.

Just add `void` to the parameter list.

---------

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2025-11-25 15:16:15 +00:00
Jan Leyonberg
3e86f05621
[OpenMP][flang] Lowering of OpenMP custom reductions to MLIR (#168417)
This patch add support for lowering of custom reductions to MLIR. It
also enhances the capability of the pass to automatically mark functions
as "declare target" by traversing custom reduction initializers and
combiners.
2025-11-24 16:00:46 -05:00
agozillon
173600880b
[Flang][OpenMP][MLIR] Initial declare target to for variables implementation (#119589)
While the infrastructure for declare target to/enter and link for
variables exists in the MLIR dialect and at the Flang level, the current
lowering from MLIR -> LLVM IR isn't in place, it's only in place for
variables that have the link clause applied.

This PR aims to extend that lowering to an initial implementation that
incorporates declare target to as well, which primarily requires changes
in the OpenMPToLLVMIRTranslation phase. However, a minor addition to the
OpenMP dialect was required to extend the declare target enumerator to
include a default None field as well.

This also requires a minor change to the Flang lowering's
MapInfoFinlization.cpp pass to alter the map type for descriptors to
deal with cases where a variable is marked declare to. Currently, when a
descriptor variable is mapped declare target to the descriptor component
can become attatched, and cannot be updated, this results in issues when
an unusual allocation range is specified (effectively an off-by X
error). The current solution is to map the descriptor always, as we
always require an up-to-date version of this data. However, this also
requires an interlinked PR that adds a more intricate type of mapping of
structures/record types that clang currently implements, to circumvent
the overwriting of the pointer in the descriptor.

3/3 required PRs to enable declare target to mapping, this PR should
pass all tests and provide an all green CI.

Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-11-24 21:22:49 +01:00
agozillon
20929abb85
[MLIR][OpenMP] Introduce overlapped record type map support (#119588)
This PR introduces a new additional type of map lowering for record
types that Clang currently supports, in which a user can map a top-level
record type and then individual members with different mapping,
effectively creating a sort of "overlapping" mapping that we attempt to
cut around.

This is currently most predominantly used in Fortran, when mapping
descriptors and there data, we map the descriptor and its data with
separate map modifiers and "cut around" the pointer data, so that wedo
not overwrite it unless the runtime deems it a neccesary action based on
its reference counting mechanism. However, it is a mechanism that will
come in handy/trigger when a user explitily maps a record type (derived
type or structure) and then explicitly maps a member with a different
map type.

These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp
file and phase, however, one Flang test that checks end-to-end IR
compilation (as far as we care for now at least) was altered.

2/3 required PRs to enable declare target to mapping, should look at PR
3/3 to check for full green passes (this one will fail a number due to
some dependencies).

Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-11-24 21:20:29 +01:00
Akash Banerjee
8aa7d823b0
[OpenMP][Flang] Emit default declare mappers implicitly for derived types (#140562)
This patch adds support to emit default declare mappers for implicit
mapping of derived types when not supplied by user. This especially
helps tackle mapping of allocatables of derived types.
2025-11-14 15:59:48 +00:00
Ethan Luis McDonough
38cade7cc6
[PGO][Offload] Fix missing names bug in GPU PGO (#166444)
After #163011 was merged, the tests in
[`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1)
broke because the offload plugins were no longer able to find
`__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm`
visible to the host on GPU targets and reverses the changes made in
f7e9968a5ba99521e6e51161f789f0cc1745193f.
2025-11-10 10:11:53 -06:00
Joseph Huber
aaddd8d38a [OpenMP] Fix tests relying on the heap size variable
Summary:
I made that an unimplemented error, but forgot that it was used for this
environment variable.
2025-11-06 13:00:26 -06:00
Joseph Huber
670c453aeb
[Offload] Remove handling for device memory pool (#163629)
Summary:
This was a lot of code that was only used for upstream LLVM builds of
AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so
just use that. Simplifies code, can be added back if we start providing
alternate forms but I don't think there's a single use-case that would
justify it yet.
2025-11-06 10:15:18 -06:00
agozillon
09318c6bff
[MLIR][OpenMP] Fix and simplify bounds offset calculation for 1-D GEP offsets (#165486)
Currently this is being calculated incorrectly and will result in
incorrect index offsets in more complicated array slices. This PR tries
to address it by refactoring and changing the calculation to be more
correct.
2025-10-31 00:54:31 +01:00
Jan Patrick Lehr
f7e9968a5b
[Offload] XFAIL pgo tests until resolved (#163722)
While people look into it, xfail the tests.
2025-10-16 11:43:55 +02:00
Joseph Huber
914fbe367e
[OpenMP] Disable a few more tests to get the bot green (#163614) 2025-10-15 14:14:15 -05:00
Jan Patrick Lehr
4b84e0f3f0
[OpenMP] Add test to print interop identifiers (#161434)
The test covers some of the identifier symbols in the interop runtime.

This test, for now, is to guard against complete breakage, which was the
result of the other `interop.c` test not being enabled on AMD and thus,
not caught by our buildbots.
2025-10-15 20:38:33 +02:00
agozillon
9155b318f2
[Flang][OpenMP] Defer descriptor mapping for assumed dummy argument types (#154349)
This PR adds deferral of descriptor maps until they are necessary for
assumed dummy argument types. The intent is to avoid a problem where a
user can inadvertently map a temporary local descriptor to device
without their knowledge and proceed to never unmap it. This temporary
local descriptor remains lodged in OpenMP device memory and the next
time another variable or descriptor residing in the same stack address
is mapped we incur a runtime OpenMP map error as we try to remap the
same address.

This fix was discussed with the OpenMP committee and applies to OpenMP
5.2 and below, future versions of OpenMP can avoid this issue via the
attach semantics added to the specification.
2025-10-09 17:52:41 +02:00
Akash Banerjee
ed12dc5e30
[Flang][OpenMP] Implicitly map nested allocatable components in derived types (#160766)
This PR adds support for nested derived types and their mappers to the
MapInfoFinalization pass.

- Generalize MapInfoFinalization to add child maps for arbitrarily
nested allocatables when a derived object is mapped via declare mapper.
- Traverse HLFIR designates rooted at the target block arg and build
full coordinate_of chains; append members with correct membersIndex.

This fixes #156461.
2025-10-02 16:15:16 +00:00
Joseph Huber
0fcce4fb7b
[OpenMP] Mark problematic tests as XFAIL / UNSUPPORTED (#161267)
Summary:
Several of these tests have been failing for literal years. Ideally we
make efforts to fix this, but keeping these broken has had serious
consequences on our testing infrastructure where failures are the norm
so almost all test failures are disregarded. I made a tracking issue for
the ones that have been disabled.

https://github.com/llvm/llvm-project/issues/161265
2025-09-29 15:17:55 -05:00