History

[Offload] Allow CUDA Kernels to use arbitrarily large shared memory (#145963 )

Previously, the user was not able to use more than 48 KB of shared
memory on NVIDIA GPUs. In order to do so, setting the function attribute
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` is required, which was not
present in the code base. With this commit, we add the ability toset
this attribute, allowing the user to utilize the full power of their
GPU.

In order to not have to reset the function attribute for each launch of
the same kernel, we keep track of the maximum memory limit (as the
variable `MaxDynCGroupMemLimit`) and only set the attribute if our
desired amount exceeds the limit. By default, this limit is set to 48
KB.

Feedback is greatly appreciated, especially around setting the new
variable as mutable. I did this becuase the `launchImpl` method is const
and I am not able to modify my variable otherwise.

---------

Co-authored-by: Giorgi Gvalia <ggvalia@login33.chn.perlmutter.nersc.gov>
Co-authored-by: Giorgi Gvalia <ggvalia@login07.chn.perlmutter.nersc.gov>

2025-07-07 15:26:16 -04:00

cmake

[Offload][cmake] Add GPU test job limit for AMDGPU buildbot cmake cache (#146611 )

2025-07-01 19:18:28 -05:00

DeviceRTL

[Offload] Fix cmake warning (#145488 )

2025-06-24 13:42:03 +01:00

docs

[Offload][NFC] Factor out and rename the __tgt_offload_entry struct (#123785 )

2025-01-21 12:05:24 -06:00

include

[Offload] Don't check in generated files (#141982 )

2025-06-03 10:39:04 -05:00

liboffload

[Offload] Add missing license header to Common.td (#146737 )

2025-07-02 17:17:30 +01:00

libomptarget

[OpenMP] Fix crash with duplicate mapping on target directive (#146136 )

2025-06-29 22:41:24 +01:00

plugins-nextgen

[Offload] Allow CUDA Kernels to use arbitrarily large shared memory (#145963 )

2025-07-07 15:26:16 -04:00

test

[libomptarget] Add a test for OMP_TARGET_OFFLOAD=disabled (#146385 )

2025-06-30 13:29:36 -05:00

tools

[Offload] Add MAX_WORK_GROUP_SIZE device info query (#143718 )

2025-07-02 16:33:54 +01:00

unittests

[Offload] Add liboffload unit tests for shared/local memory (#147040 )

2025-07-07 16:20:02 +01:00

utils

…

CMakeLists.txt

[Offload] Add OFFLOAD_INCLUDE_TESTS (#143388 )

2025-06-09 10:27:40 -05:00

Maintainers.md

[Offload] Add 'Maintainers.md' file for offload (#138177 )

2025-05-01 14:06:33 -05:00

README.md

[Offload][NFC] Update README.md

2024-11-17 07:32:29 -08:00

README.txt

…

README.md

The LLVM/Offload Subproject

The Offload subproject aims at providing tooling, runtimes, and APIs that allow users to execute code on accelerators or other "co-processors" that may or may not match the architecture of their "host". In the long run, all kinds of targets are in scope of this effort, including but not limited to: CPUs, GPUs, FPGAs, AI/ML accelerators, distributed resources, etc.

For OpenMP offload users, the project is ready and fully usable. The final API design is still under development. More content will show up here and on our webpage soon. In the meantime, people are encouraged to participate in our meetings (see below) and check our development board as well as the discussions on Discourse.

Meetings

Every second Wednesday, 7:00 - 8:00am PT, starting Jan 24, 2024. Alternates with the OpenMP in LLVM meeting. invite.ics Meeting Minutes and Agenda