27 Commits

Author SHA1 Message Date
Joseph Huber
f233a54ae8
[OpenMP] Remove usage of pointer-to-member in lookup (#123671)
Summary:
This is buggy and is currently being tracked in
https://github.com/llvm/llvm-project/issues/123241. For now, replace it
with a macro so that we can use address spaces directly.
2025-01-21 07:50:40 -06:00
Joseph Huber
3274bf6b42
[OpenMP] Make each atomic helper take an atomic scope argument (#122786)
Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.
2025-01-20 21:58:27 -06:00
Joseph Huber
2d9f406943
[OpenMP] Adjust 'printf' handling in the OpenMP runtime (#123670)
Summary:
We used to avoid a lot of this stuff because we didn't properly handle
variadics in device code. That's been solved for now, so we can just
make an internal printf handler that forwards to the external `vprintf`
function. This is either provided by NVIDIA's SDK or by the GPU libc
implementation.

The main reason for doing this is because it prevents the stupid AMDGPU
printf pass from mangling our beautiful printfs!
2025-01-20 21:56:46 -06:00
Joseph Huber
723a3e746a [OpenMP] Fix mispelled attribute and warning
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
2025-01-20 08:40:19 -06:00
Joseph Huber
58af82b462
[OpenMP] Remove 'omp assumes' scopes now that we have no inline ASM (#123611)
Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.

Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.

---------

Co-authored-by: Michael Kruse <github@meinersbur.de>
2025-01-20 08:11:06 -06:00
Joseph Huber
1c00d0d776
[OpenMP] Remove hack around missing atomic load (#122781)
Summary:
We used to do a fetch add of zero to approximate a load. This is because
the NVPTX backend didn't handle this properly. It's not an issue anymore
so simply use the proper atomic builtin.
2025-01-16 15:17:15 -06:00
Joseph Huber
74d5373f49 [OpenMP] Fix missing type getter for SFINAE helper
Summary:
This didn't get the type, which made using this always return false.
2025-01-10 19:35:41 -06:00
Joseph Huber
f53cb84df6
[OpenMP] Use __builtin_bit_cast instead of UB type punning (#122325)
Summary:
Use a normal bitcast, remove from the shared utils since it's not
available in
GCC 7.4
2025-01-09 13:59:21 -06:00
Joseph Huber
b57c0bac81
[OpenMP] Update atomic helpers to just use headers (#122185)
Summary:
Previously we had some indirection here, this patch updates these
utilities to just be normal template functions. We use SFINAE to manage
the special case handling for floats. Also this strips address spaces so
it can be used more generally.
2025-01-09 13:57:39 -06:00
Joseph Huber
f4ee5a673f
[OpenMP] Replace AMDGPU fences with generic scoped fences (#119619)
Summary:
This is simpler and more common. I would've replaced the CUDA uses and
made this the same but currently it doesn't codegen these fences fully
and just emits a full system wide barrier as a fallback.
2024-12-12 07:54:51 -06:00
Joseph Huber
c3ac3fe825
[OpenMP] Fix redefining stdint.h types (#108607)
Summary:
We can include `stdint.h` just fine as long as we don't allow it to find
system headers, passing `-nostdlibinc` and `-nogpuinc` suppresses these
extra paths so we will just use the clang resource headers for
`stdint.h` and `stddef.h`.
2024-09-13 13:22:44 -05:00
Johannes Doerfert
08533a3ee8
[Offload][NFC] Reorganize utils:: and make Device/Host/Shared clearer (#100280)
We had three `utils::` namespaces, all with different "meaning" (host,
device, hsa_utils). We should, when we can, keep "include/Shared"
accessible from host and device, thus RefCountTy has been moved to a
separate header. `hsa_utils` was introduced to make `utils::` less
overloaded. And common functionality was de-duplicated, e.g.,
`utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type
punning now checks for the size of the result to make sure it matches
the source type.

No functional change was intended.
2024-09-05 13:36:26 -07:00
Ethan Luis McDonough
fde2d23ee2
[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.

> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-08-22 01:10:54 -05:00
Joseph Huber
74d23f15b6
[OpenMP] Implement 'omp_alloc' on the device (#102526)
Summary:
The 'omp_alloc' function should be callable from a target region. This
patch implemets it by simply calling `malloc` for every non-default
trait value allocator. All the special access modifiers are
unimplemented and return null. The null allocator returns null as the
spec states it should not be usable from the target.
2024-08-14 13:38:55 -05:00
Joseph Huber
dbb8b7a0f4 Reapply "[OpenMP][libc] Remove special handling for OpenMP printf (#98940)"
This reverts commit fea5914c926e2f013a8b5e27eaa74c7047fb2c71.
2024-07-26 17:21:56 -05:00
Joseph Huber
fea5914c92 Revert "[OpenMP][libc] Remove special handling for OpenMP printf (#98940)"
This reverts commit 069e8bcd82c4420239f95c7e6a09e1f756317cfc.

Summary:
Some tests failing, revert this for now.
2024-07-26 16:39:12 -05:00
Joseph Huber
069e8bcd82
[OpenMP][libc] Remove special handling for OpenMP printf (#98940)
Summary:
Currently there are several layers to handle `printf`. Since we now have
varargs and an implementation of `printf` this can be heavily
simplified.

1. The frontend renames `printf` into `omp_vprintf` and gives it an
   argument buffer.

Removing 1. triggered some code in the AMDGPU backend menat for HIP /
OpenCL, so I hadded an exception to it.

2. Forward this to CUDA vprintf or ignore it.

We no longer need special handling for it since we have varargs. So now
we just forward this to CUDA vprintf if we have libc, otherwise just
leave `printf` as an external function and expect that `libc` will be
linked in.
2024-07-26 16:03:36 -05:00
Gheorghe-Teodor Bercea
1a478a69bc
[OpenMP][offload] Fix dynamic schedule tracking (#97065)
This patch fixes the dynamic schedule tracking.
2024-07-01 10:23:11 -04:00
Ethan Luis McDonough
2c8b912f63
Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"
This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.
2024-06-28 12:30:45 -05:00
Ethan Luis McDonough
5fd2af38e4
[PGO][OpenMP] Instrumentation for GPU devices (#76587)
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:

- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)

These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-06-28 10:42:19 -05:00
Shilei Tian
b448efb8ea
Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)" (#94139) 2024-06-03 11:17:36 -04:00
Shilei Tian
cf9eeb67e5 Revert "Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)""
This reverts commit 7b4865582299294455bc816358fd88a9c6e5e0be.
2024-05-26 01:04:39 -04:00
Shilei Tian
7b48655822 Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)"
This reverts commit 9b31cc71d66064dfaf2afabf4a835211321bb4a0.
2024-05-26 00:57:50 -04:00
Joseph Huber
9b31cc71d6 Revert "[OpenMP][OMPX] Add shfl_down_sync (#93311)"
This reverts commit 098c6dfa8157681699a71fce9e3d94515e66311f.
This reverts commit 8c718a3a91df4ab68dc3f1ca3887ea730c9aed84.
This reverts commit 4fb02de9d490d0773441aa30124bb4d1272230d3.
2024-05-24 19:07:53 -05:00
Shilei Tian
4fb02de9d4
[OpenMP][OMPX] Add shfl_down_sync (#93311) 2024-05-24 14:00:43 -04:00
Shilei Tian
7eeec8e6d1
[OpenMP][OMPX] Add ballot_sync (#91297)
This patch adds the support for `ballot_sync` in ompx.
2024-05-24 09:54:54 -04:00
Johannes Doerfert
330d8983d2
[Offload] Move /openmp/libomptarget to /offload (#75125)
In a nutshell, this moves our libomptarget code to populate the offload
subproject.

With this commit, users need to enable the new LLVM/Offload subproject
as a runtime in their cmake configuration.
No further changes are expected for downstream code.

Tests and other components still depend on OpenMP and have also not been
renamed. The results below are for a build in which OpenMP and Offload
are enabled runtimes. In addition to the pure `git mv`, we needed to
adjust some CMake files. Nothing is intended to change semantics.

```
ninja check-offload
```
Works with the X86 and AMDGPU offload tests

```
ninja check-openmp
```
Still works but doesn't build offload tests anymore.

```
ls install/lib
```
Shows all expected libraries, incl.
- `libomptarget.devicertl.a`
- `libomptarget-nvptx-sm_90.bc`
- `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git`
- `libomptarget.so` -> `libomptarget.so.18git`

Fixes: https://github.com/llvm/llvm-project/issues/75124

---------

Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>
2024-04-22 09:51:33 -07:00