Joseph Huber 47b7c91abe
[libc] Rework the GPU build to be a regular target (#81921)
Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly superior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Some certain utilities need to be built with 
`--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine
workaround as we
will always assume that the GPU `libc` is a cross-build with a
functioning host.

Depends on https://github.com/llvm/llvm-project/pull/81557
2024-02-22 15:29:29 -06:00

89 lines
3.4 KiB
ReStructuredText

.. _libc_gpu_usage:
===================
Using libc for GPUs
===================
.. contents:: Table of Contents
:depth: 4
:local:
Building the GPU library
========================
LLVM's libc GPU support *must* be built with an up-to-date ``clang`` compiler
due to heavy reliance on ``clang``'s GPU support. This can be done automatically
using the LLVM runtimes support. The GPU build is done using cross-compilation
to the GPU architecture. This project currently supports AMD and NVIDIA GPUs
which can be targeted using the appropriate target name. The following
invocation will enable a cross-compiling build for the GPU architecture and
enable the ``libc`` project only for them.
.. code-block:: sh
$> cd llvm-project # The llvm-project checkout
$> mkdir build
$> cd build
$> cmake ../llvm -G Ninja \
-DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \
-DLLVM_ENABLE_RUNTIMES="openmp" \
-DCMAKE_BUILD_TYPE=<Debug|Release> \ # Select build type
-DCMAKE_INSTALL_PREFIX=<PATH> \ # Where 'libcgpu.a' will live
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libc \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libc \
-DLLVM_RUNTIME_TARGETS=default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda
$> ninja install
Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
using a compatible compiler and to support ``openmp`` offloading, we list them
in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
directory in which to install the ``libcgpu-nvptx.a`` and ``libcgpu-amdgpu.a``
libraries and headers along with LLVM. The generated headers will be placed in
``include/<gpu-triple>``.
Usage
=====
Once the static archive has been built it can be linked directly
with offloading applications as a standard library. This process is described in
the `clang documentation <https://clang.llvm.org/docs/OffloadingDesign.html>`_.
This linking mode is used by the OpenMP toolchain, but is currently opt-in for
the CUDA and HIP toolchains through the ``--offload-new-driver``` and
``-fgpu-rdc`` flags. A typical usage will look this this:
.. code-block:: sh
$> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu
The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
supported target device. The supported architectures can be seen using LLVM's
``llvm-objdump`` with the ``--offloading`` flag:
.. code-block:: sh
$> llvm-objdump --offloading libcgpu.a
libcgpu.a(strcmp.cpp.o): file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind llvm ir
arch generic
triple amdgcn-amd-amdhsa
producer none
Because the device code is stored inside a fat binary, it can be difficult to
inspect the resulting code. This can be done using the following utilities:
.. code-block:: sh
$> llvm-ar x libcgpu.a strcmp.cpp.o
$> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
$> opt -S out.bc
...
Please note that this fat binary format is provided for compatibility with
existing offloading toolchains. The implementation in ``libc`` does not depend
on any existing offloading languages and is completely freestanding.