Summary: This is a massive patch because it reworks the entire build and everything that depends on it. This is not split up because various bots would fail otherwise. I will attempt to describe the necessary changes here. This patch completely reworks how the GPU build is built and targeted. Previously, we used a standard runtimes build and handled both NVPTX and AMDGPU in a single build via multi-targeting. This added a lot of divergence in the build system and prevented us from doing various things like building for the CPU / GPU at the same time, or exporting the startup libraries or running tests without a full rebuild. The new appraoch is to handle the GPU builds as strict cross-compiling runtimes. The first step required https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC` target to build for the GPU without touching the other targets. This means that the GPU uses all the same handling as the other builds in `libc`. The new expected way to build the GPU libc is with `LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`. The second step was reworking how we generated the embedded GPU library by moving it into the library install step. Where we previously had one `libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This patch includes the necessary clang / OpenMP changes to make that not break the bots when this lands. We unfortunately still require that the NVPTX target has an `internal` target for tests. This is because the NVPTX target needs to do LTO for the provided version (The offloading toolchain can handle it) but cannot use it for the native toolchain which is used for making tests. This approach is vastly superior in every way, allowing us to treat the GPU as a standard cross-compiling target. We can now install the GPU utilities to do things like use the offload tests and other fun things. Some certain utilities need to be built with `--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine workaround as we will always assume that the GPU `libc` is a cross-build with a functioning host. Depends on https://github.com/llvm/llvm-project/pull/81557
89 lines
3.4 KiB
ReStructuredText
89 lines
3.4 KiB
ReStructuredText
.. _libc_gpu_usage:
|
|
|
|
|
|
===================
|
|
Using libc for GPUs
|
|
===================
|
|
|
|
.. contents:: Table of Contents
|
|
:depth: 4
|
|
:local:
|
|
|
|
Building the GPU library
|
|
========================
|
|
|
|
LLVM's libc GPU support *must* be built with an up-to-date ``clang`` compiler
|
|
due to heavy reliance on ``clang``'s GPU support. This can be done automatically
|
|
using the LLVM runtimes support. The GPU build is done using cross-compilation
|
|
to the GPU architecture. This project currently supports AMD and NVIDIA GPUs
|
|
which can be targeted using the appropriate target name. The following
|
|
invocation will enable a cross-compiling build for the GPU architecture and
|
|
enable the ``libc`` project only for them.
|
|
|
|
.. code-block:: sh
|
|
|
|
$> cd llvm-project # The llvm-project checkout
|
|
$> mkdir build
|
|
$> cd build
|
|
$> cmake ../llvm -G Ninja \
|
|
-DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \
|
|
-DLLVM_ENABLE_RUNTIMES="openmp" \
|
|
-DCMAKE_BUILD_TYPE=<Debug|Release> \ # Select build type
|
|
-DCMAKE_INSTALL_PREFIX=<PATH> \ # Where 'libcgpu.a' will live
|
|
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libc \
|
|
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libc \
|
|
-DLLVM_RUNTIME_TARGETS=default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda
|
|
$> ninja install
|
|
|
|
Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
|
|
toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
|
|
using a compatible compiler and to support ``openmp`` offloading, we list them
|
|
in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
|
|
newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
|
|
directory in which to install the ``libcgpu-nvptx.a`` and ``libcgpu-amdgpu.a``
|
|
libraries and headers along with LLVM. The generated headers will be placed in
|
|
``include/<gpu-triple>``.
|
|
|
|
Usage
|
|
=====
|
|
|
|
Once the static archive has been built it can be linked directly
|
|
with offloading applications as a standard library. This process is described in
|
|
the `clang documentation <https://clang.llvm.org/docs/OffloadingDesign.html>`_.
|
|
This linking mode is used by the OpenMP toolchain, but is currently opt-in for
|
|
the CUDA and HIP toolchains through the ``--offload-new-driver``` and
|
|
``-fgpu-rdc`` flags. A typical usage will look this this:
|
|
|
|
.. code-block:: sh
|
|
|
|
$> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu
|
|
|
|
The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
|
|
supported target device. The supported architectures can be seen using LLVM's
|
|
``llvm-objdump`` with the ``--offloading`` flag:
|
|
|
|
.. code-block:: sh
|
|
|
|
$> llvm-objdump --offloading libcgpu.a
|
|
libcgpu.a(strcmp.cpp.o): file format elf64-x86-64
|
|
|
|
OFFLOADING IMAGE [0]:
|
|
kind llvm ir
|
|
arch generic
|
|
triple amdgcn-amd-amdhsa
|
|
producer none
|
|
|
|
Because the device code is stored inside a fat binary, it can be difficult to
|
|
inspect the resulting code. This can be done using the following utilities:
|
|
|
|
.. code-block:: sh
|
|
|
|
$> llvm-ar x libcgpu.a strcmp.cpp.o
|
|
$> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
|
|
$> opt -S out.bc
|
|
...
|
|
|
|
Please note that this fat binary format is provided for compatibility with
|
|
existing offloading toolchains. The implementation in ``libc`` does not depend
|
|
on any existing offloading languages and is completely freestanding.
|