Previously, the user was not able to use more than 48 KB of shared memory on NVIDIA GPUs. In order to do so, setting the function attribute `CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` is required, which was not present in the code base. With this commit, we add the ability toset this attribute, allowing the user to utilize the full power of their GPU. In order to not have to reset the function attribute for each launch of the same kernel, we keep track of the maximum memory limit (as the variable `MaxDynCGroupMemLimit`) and only set the attribute if our desired amount exceeds the limit. By default, this limit is set to 48 KB. Feedback is greatly appreciated, especially around setting the new variable as mutable. I did this becuase the `launchImpl` method is const and I am not able to modify my variable otherwise. --------- Co-authored-by: Giorgi Gvalia <ggvalia@login33.chn.perlmutter.nersc.gov> Co-authored-by: Giorgi Gvalia <ggvalia@login07.chn.perlmutter.nersc.gov>
The LLVM/Offload Subproject
The Offload subproject aims at providing tooling, runtimes, and APIs that allow users to execute code on accelerators or other "co-processors" that may or may not match the architecture of their "host". In the long run, all kinds of targets are in scope of this effort, including but not limited to: CPUs, GPUs, FPGAs, AI/ML accelerators, distributed resources, etc.
For OpenMP offload users, the project is ready and fully usable. The final API design is still under development. More content will show up here and on our webpage soon. In the meantime, people are encouraged to participate in our meetings (see below) and check our development board as well as the discussions on Discourse.
Meetings
Every second Wednesday, 7:00 - 8:00am PT, starting Jan 24, 2024. Alternates with the OpenMP in LLVM meeting. invite.ics Meeting Minutes and Agenda