This commit adds driver support for linking libclc OpenCL libraries. It
takes the form of a new optional flag: --libclc-lib=namespec. Nothing is
linked unless this flag is specified.
Not all libclc targets have corresponding clang targets. For this reason
it is desirable for users to be able to specify a libclc library name.
We support this by taking both a library name (without the .bc suffix)
or a filename. Both of these are searched for in the clang resource
directory. Filenames are
also checked themselves so that absolute paths can be provided. The
syntax for specifying filenames (as opposed to library names) uses a
leading colon (:), inspired by the -l option.
To accommodate this option, libclc libraries are now placed into clang's
resource directory in an in-tree configuration. The libraries are all
placed in <resource-dir>/lib/libclc and
are not grouped under host-specific directories as some other runtime
libraries are; it is not expected that OpenCL libraries will differ
depending on the host toolchain.
Currently only the AMDGPU toolchain supports this option as a proof of
concept. Other targets such as NVPTX or SPIR/SPIR-V could support it
too. We could optionally let target toolchains search for libclc
libraries themselves, possibly when passed an empty --libclc-lib.
This patch provides a single point for handling the logic behind
choosing common bitcode libraries. The intention is that the users of
ROCm installation detector will not have to rewrite options handling
code each time the bitcode libraries are queried. This is not too
distant from detectors for other architecture that encapsulate the
similar decision making process, providing cleaner interface. The only
flag left in `getCommonBitcodeLibs` (main point of entry) is
`NeedsASanRT`, this is deliberate, as in order to calculate it we need
to consult `ToolChain`.
Summary:
This patch reworks how we create offloading toolchains. Previously we
would handle this separately for all the different kinds. This patch
instead changes this to use the target triple and the offloading kind to
determine the proper toolchain. In the old case where the user only
passes `--offload-arch` we instead infer the triple from the passed
arguments. This is a pretty major overhaul but currently passes all the
clang tests with only minor changes to error messages.
Summary:
These commands both do the same thing and behave like the same tool.
Now, the `nvptx-arch` and `amdgpu-arch` tools cause it to only emit
architectures for that name.
Currently there is only option -nogpuinc for disabling
the default CUDA/HIP wrapper headers. However, there
are situations where -nogpuinc needs to be overriden
for enabling CUDA/HIP wrapper headers. This patch
adds --[no-]offload-inc for that purpose. When both
exist, the last wins. -nogpuinc and -nocudainc are
now alias to --no-offload-inc.
This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.
We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.
Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:
`clang -cc1 -mprefer-vector-width=junk test.c'
With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
This reverts commit 028429ac452acde227ae0bfafbfe8579c127e1ea and
1004fae222efeee215780c4bb4e64eb82b07fb4f.
These really need to be part of the compiler distribution. Bots are
relying on a nearly year old version to provide bitcode.
#130963 switches the default to COV6, which requires ROCm 6.3.
Currently, if the
device libraries for COV6 are not found, the error message is not very
helpful.
This PR provides a more informative error message in such cases.
There is special logic to detect the hip runtime when llvm is installed
with Spack. It works by matching the install prefix of llvm against
`llvm-amdgpu-*` followed by effectively globbing for
```
<llvm dir>/../hip-x.y.z-*/
```
and checking there is exactly one such directory.
I would suggest to remove autodetection for the following reasons:
1. In the Spack ecosystem it's by design that every package lives in
its own prefix, and can only know where its dependencies are
installed, it has no clue what its dependents are and where they are
installed. This heuristic detection breaks that invariant, since `hip`
is a dependent of `llvm`, and can be surprising to Spack users.
2. The detection can lead to false positives, since users can be using
an llvm installed "upstream" with their own build of hip locally, and
they may not realize that clang is picking up upstream hip instead of
their local copy.
3. It only works if the directory name is `llvm-amdgpu-*` which happens
to be the name of AMD's fork of `llvm`, so it makes no sense that
this code lives in the main LLVM repo for which the Spack package
name is `llvm`. Feels wrong that LLVM knows about Spack package
names, which can change over time.
4. Users can change the install directory structure, meaning that this
detection is not robust under config changes in Spack.
Summary:
The https://github.com/llvm/llvm-project/pull/128509 patch introduced
`--flto-partitions`. This was marked as a HIP only argument, and was
also spelled and handled incorrectly for an `-f` option. This patch
makes the handling generic for `ld.lld` consumers.
This also fixes some issues with emitting the flags being put after the
default arguments, preventing users from overriding them. Also, forwards
things properly for the new driver so we can test this.
Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the `libc` project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.
I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the `--target=amdgcn-amd-amdhsa` for its output,
which means we'd need to forward the default somehow.
The default number of partitions is the number of cores on the machine
with a cap at 16, as going above 16 is unlikely to be useful in the
common case.
Adds a flto-partitions option to override the number of partitions
easily (without having to use -Xoffload-linker). Setting it to 1
effectively disables module splitting.
Fixes SWDEV-506214
Summary:
We support `nogpulib` to disable implicit libraries. In the future we
will want to change the default linking of these libraries based on the
user language. This patch just introduces a positive variant so now we
can do `-nogpulib -gpulib` to disable it.
Later patch will make the default a variable in the ROCmToolChain
depending on the target languages.
- This reverts commit
0c6c4a9993.
- Add '-mcode-object-version=5' as to explicitly use code object
version 5 to match with 'FAIL' diagnostic.
- Add Requires directive to support lit test run on platforms
registered with x86_64 and amdgpu.
Summary:
This probably wasn't the intended result, but the code here causes
OpenMP to always link in `ockl.bc` which was intentionally not linked.
This results in the OCKL definitions conflicting with the OpenMP ones
and also prevents them from being optimized out (Might be fixed with
newer ROCm that actually builds the visibility correctly).
I'm pretty sure the only reason this didn't break the tests is because
we're smart and pass `-nogpulib` there to keep the environment from
being poisoned with stuff like this.
ASan bitcode linking is currently available for HIPAMD,OpenMP and
OpenCL. Moving sanitizer specific common parts of logic to appropriate
API's so as to reduce code redundancy and maintainability.
Summary:
The C library for GPUs provides the ability to target regular C/C++
programs by providing the C library and a file containing kernels that
call the `main` function. This is mostly used for unit tests, this patch
provides a quick way to add them without needing to know the paths. I
currently do this explicitly, but according to the libc++ contributors
we don't want to need to specify these paths manually. See the
discussion in https://github.com/llvm/llvm-project/pull/104515.
I just default to `lib/` if the target-specific one isn't found because
the linker will handle giving a reasonable error message if it's not
found. Basically the use-case looks like this.
```console
$ clang test.c --target=amdgcn-amd-amdhsa -mcpu=native -startfiles -stdlib
$ amdhsa-loader a.out
PASS!
```
Summary:
I initially thought that it would be convenient to automatically link
these libraries like they are for standard C/C++ targets. However, this
created issues when trying to use C++ as a GPU target. This patch moves
the logic to now implicitly pass it as part of the offloading toolchain
instead, if found. This means that the user needs to set the target
toolchain for the link job for automatic detection, but can still be
done manually via `-Xoffload-linker -lc`.
When working on very busy systems, check-offload frequently fails many
tests with this diagnostic:
```
clang: error: cannot determine amdgcn architecture: /tmp/llvm/build/bin/amdgpu-arch: Child timed out: ; consider passing it via '-march'
```
This patch accepts the environment variable
`CLANG_TOOLCHAIN_PROGRAM_TIMEOUT` to set the timeout. It also increases
the timeout from 10 to 60 seconds.
Summary:
The `ld.lld` linker handles LTO, but it does not understand the
target-id syntax some AMDGPU targets use. This patch parses the
target-id and passes the processor name in `-mcpu` and features in
`-mattr`.
Summary:
We can use `-r` and `--lto-emit-llvm` to get the LTO pass to optimize +
link LLVM-IR. Currently this doesn't work on AMDGPU because it always
passes `-shared` which is incompatible with `-r`. Fix that so we can use
it.
Summary:
The previous patches (The other commits in this chain) allow the
offloading toolchain to directly invoke the device linker. Because of
this, we can now just have the toolchain implicitly include `-lc` and
`-lm` like a standard target does. This removes the old handling that
went through the fat binary `-lcgpu`.
Summary:
AMDGPU supports a `target-id` feature which is used to qualify targets
with different incompatible features. These are both rules and target
features. Currently, we pass `-target-cpu` twice when offloading to
OpenMP, and do not pass the target-id features at all. The effect was
that passing something like `--offload-arch=gfx90a:xnack+` would show up
as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the
xnack completely and passing it twice. This patch fixes that to pass it
once and then separate it like how HIP does.
Summary:
The utilities `nvptx-arch` and `amdgpu-arch` are used to support
`--offload-arch=native` among other utilities in clang. However, these
rely on the GPU drivers to query the features. In certain cases these
drivers can become locked up, which will lead to indefinate hangs on any
compiler jobs running in the meantime.
This patch adds a ten second timeout period for these utilities before
it kills the job and errors out.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
13 under clang/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
Summary:
The AMDGPU toolchain simply took the short name to get the link job
instead of using the common utilities that respect options like
`-fuse-ld`. Any linker that isn't `ld.lld` will fail, however we should
be able to override it.
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/<triple>` for the GPU and the libraries to be in
`lib/<triple>`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.
```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```
The forwarding header used by `hipstdpar` on AMDGPU targets is now
pacakged with `rocThrust`. This change augments the ROCm Driver
component so that it can automatically pick up the packaged header iff
the user hasn't overridden it via the dedicated flag.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Summary:
The 'ockl' bitcode library from the ROCm device library contains several
implementations of functions like `printf` and `malloc`. We currently do
not depend on these in the OpenMP toolchain, so we shouldn't be linking
them. The primary motivation behind this change is the library rewriting
calls to `printf` and pulling in other unused 'hostcall' routines.
This patch adds the Driver changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What this change does can be summed up as follows:
- add two flags, one for enabling `hipstdpar` compilation, the second enabling the optional allocation interposition mode;
- the flags correspond to new LangOpt members;
- if we are compiling or linking with --hipstdpar, we enable HIP; in the compilation case C and C++ inputs are treated as HIP inputs;
- the ROCm / AMDGPU driver is augmented to look for and include an implementation detail forwarding header; we error out if the user requested `hipstdpar` but the header or its dependencies cannot be found.
Tests for the behaviour described above are also added.
Reviewed by: MaskRay, yaxunl
Differential Revision: https://reviews.llvm.org/D155775
Previously, for linking in amdgpu contexts, the --no-undefined was appended to the options passed to lld,
overriding any user-supplied options via "-Wl," or "-Xlinker". We now prepend --no-undefined so that
the user options are respected.
Differential Revision: https://reviews.llvm.org/D158582
When -mcpu=native is specified, try detecting GPU
on the system by using amdgpu-arch tool. If it
fails to detect GPU, emit an error about GPU
not detected. If multiple GPUs are detected,
use the first GPU and emit a warning.
Reviewed by: Matt Arsenault, Fangrui Song
Differential Revision: https://reviews.llvm.org/D154531
D150013 is to render -L for AMDGPU but updating tools::AddLinkerInputs is wrong
and causes many non-isCrossCompiling targets to have duplicate -L options
because they do `Args.AddAllArgs(CmdArgs, options::OPT_L);`.
Revert the change and add a `Args.AddAllArgs(CmdArgs, options::OPT_L);` instead.
ROCm used to install components under individual directories,
e.g. HIP installed to /opt/rocm/hip and rocblas installed to
/opt/rocm/rocblas. ROCm has transitioned to a flat directory
structure where all components are installed to /opt/rocm.
HIP-PATH and --hip-path are supposed to be /opt/rocm as
clang detect HIP version by /opt/rocm/share/hip/version.
However, some existing HIP app still uses HIP-PATH=/opt/rocm/hip.
To avoid regression, clang will also try detect share/hip/version
under the parent directory of HIP-PATH or --hip-path.
This way, the detection will work for both new HIP-PATH and
old HIP-PATH.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D154077
Fixes: SWDEV-407757
Currently, AMDGPU more or less only supports linking with LTO. If the
user does not either pass `-flto` or `-Wl,-plugin-opt=mcpu=` manually
linking will fail because the architecture's aren't compatible. THis
patch simply passes `-mcpu` by default if it was specified. Should be a
no-op if it's not actually used.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D153909
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.