This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/<triple>` for the GPU and the libraries to be in
`lib/<triple>`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.
```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```
The forwarding header used by `hipstdpar` on AMDGPU targets is now
pacakged with `rocThrust`. This change augments the ROCm Driver
component so that it can automatically pick up the packaged header iff
the user hasn't overridden it via the dedicated flag.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Summary:
The 'ockl' bitcode library from the ROCm device library contains several
implementations of functions like `printf` and `malloc`. We currently do
not depend on these in the OpenMP toolchain, so we shouldn't be linking
them. The primary motivation behind this change is the library rewriting
calls to `printf` and pulling in other unused 'hostcall' routines.
This patch adds the Driver changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What this change does can be summed up as follows:
- add two flags, one for enabling `hipstdpar` compilation, the second enabling the optional allocation interposition mode;
- the flags correspond to new LangOpt members;
- if we are compiling or linking with --hipstdpar, we enable HIP; in the compilation case C and C++ inputs are treated as HIP inputs;
- the ROCm / AMDGPU driver is augmented to look for and include an implementation detail forwarding header; we error out if the user requested `hipstdpar` but the header or its dependencies cannot be found.
Tests for the behaviour described above are also added.
Reviewed by: MaskRay, yaxunl
Differential Revision: https://reviews.llvm.org/D155775
Previously, for linking in amdgpu contexts, the --no-undefined was appended to the options passed to lld,
overriding any user-supplied options via "-Wl," or "-Xlinker". We now prepend --no-undefined so that
the user options are respected.
Differential Revision: https://reviews.llvm.org/D158582
When -mcpu=native is specified, try detecting GPU
on the system by using amdgpu-arch tool. If it
fails to detect GPU, emit an error about GPU
not detected. If multiple GPUs are detected,
use the first GPU and emit a warning.
Reviewed by: Matt Arsenault, Fangrui Song
Differential Revision: https://reviews.llvm.org/D154531
D150013 is to render -L for AMDGPU but updating tools::AddLinkerInputs is wrong
and causes many non-isCrossCompiling targets to have duplicate -L options
because they do `Args.AddAllArgs(CmdArgs, options::OPT_L);`.
Revert the change and add a `Args.AddAllArgs(CmdArgs, options::OPT_L);` instead.
ROCm used to install components under individual directories,
e.g. HIP installed to /opt/rocm/hip and rocblas installed to
/opt/rocm/rocblas. ROCm has transitioned to a flat directory
structure where all components are installed to /opt/rocm.
HIP-PATH and --hip-path are supposed to be /opt/rocm as
clang detect HIP version by /opt/rocm/share/hip/version.
However, some existing HIP app still uses HIP-PATH=/opt/rocm/hip.
To avoid regression, clang will also try detect share/hip/version
under the parent directory of HIP-PATH or --hip-path.
This way, the detection will work for both new HIP-PATH and
old HIP-PATH.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D154077
Fixes: SWDEV-407757
Currently, AMDGPU more or less only supports linking with LTO. If the
user does not either pass `-flto` or `-Wl,-plugin-opt=mcpu=` manually
linking will fail because the architecture's aren't compatible. THis
patch simply passes `-mcpu` by default if it was specified. Should be a
no-op if it's not actually used.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D153909
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
Fix two issues:
--hip-path should not do rigorous checking, i.e. if .hipVersion exists it
will use it, otherwise it will not error out but assumes the default
HIP version. This is to be consistent with --rocm-path behavior.
when HIP_PATH is empty, it should be ignored. This is to be consistent
with ROCM_PATH behavior.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D152734
Fixes: SWDEV-404771
Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes.
If WGP mode is not supported, ignore -mno-cumode and emit a warning.
This is needed for implementing device functions like __smid
(312dff7b79/include/hip/amd_detail/amd_device_functions.h (L957))
Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner
Differential Revision: https://reviews.llvm.org/D145343
HIP may be installed into /usr or /usr/local on a variety of Linux
operating systems. It may become unwieldy to list them all.
Reviewed by: Siu Chi Chan, Yaxun Liu
Differential Revision: https://reviews.llvm.org/D149110
AMDGPU uses ELF shared libraries to implement their executable device
images. One downside to this method is that it disables regular warnings
on undefined symbols. This is because shared libraries expect these to
be resolves by later loads. However, the GPU images do not support
dynamic linking so any undefined symbol is going to cause a runtime
error. This patch adds `--no-undefined` to the `ld.lld` invocation to guarantee
that undefined symbols are always caught as linking errors rather than
runtime errors.
Reviewed By: arsenm, MaskRay, #amdgpu
Differential Revision: https://reviews.llvm.org/D145941
The AMDGPU toolchain support directly compiling GPU images using
cross-compilation such as `clang --target=amdgcn-amd-amdhsa foo.c`.
However, when attempting to link bitcode this does not work because the
`-mcpu` options are not forwarded to the linker among others. This patch
simply adds them so that `clang --target=amdgcn-amd-amdhsa foo.c -flto`
works correctly.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D144505
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
Lazyly initialize uncommon toolchain detector
Cuda and rocm toolchain detectors are currently run unconditionally,
while their result may not be used at all. Make their initialization
lazy so that the discovery code is not run in common cases.
Reapplied since 77910ac374656319ff114ef251fda358d4aa166a landed and
fixes the test ordering issue.
Differential Revision: https://reviews.llvm.org/D142606
Cuda and rocm toolchain detectors are currently run unconditionally,
while their result may not be used at all. Make their initialization
lazy so that the discovery code is not run in common cases.
Differential Revision: https://reviews.llvm.org/D142606
The AMDGPU target only emits shared libraries currently. This patch
changes the handling of the PIC level to be managed in the
AMDGPUToolChain rather than having a special case for it. This causes
`--target=amdgcn--` to no longer set the PIC. This should be an
acceptable change since that doesn't use a correct toolchain anyway.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142999
- Add support for finding device libraries in new ROCm directory
structure
- Simplify and remove the handling of legacy ROCm directory structure
Change-Id: I04da3bc9da85ced4b56b0225efb6b94448b8c5a1
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D140315
This patch adds basic support for `--offload-arch=native` to CUDA. This
is done using the `nvptx-arch` tool that was introduced previously. Some
of the logic for handling executing these tools was factored into a
common helper as well. This patch does not add support for OpenMP or the
"new" driver. That will be done later.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D141051
HIP is installed at /usr or /usr/local on Debin/Fedora,
and the version file is at {root}/share/hip/version.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D135796
The clang compiler prepends the HIP header include paths to the search
list using -internal-isystem when building for the HIP language. This
prevents warnings related to things like reserved identifiers when
including the HIP headers even when ROCm is installed in a non-system
directory, such as /opt/rocm.
However, when HIP is installed in /usr, then the prepended include
path would be /usr/include. That is a problem, because the C standard
library headers are stored in /usr/include and the C++ standard
library headers must come before the C library headers in the search
path list (because the C++ standard library headers use #include_next
to include the C standard library headers).
While the HIP wrapper headers _do_ need to be earlier in the search
than the C++ headers, those headers get their own subdirectory and
their own explicit -internal-isystem argument. This include path is for
<hip/hip_runtime_api.h> and <hip/hip_runtime.h>, which do not require a
particular search ordering with respect to the C or C++ headers. Thus,
HIP include path is added after other system include paths.
With contribution from Cordell Bloor.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D120132
New device library supporting v4 and v5 has abi_version_400.bc and abi
version_500.bc.
For v5, abi_version_500.bc is linked.
For v2-4, abi_version_400.bc is linked.
For old device library, for v2-4, none of the above is linked. For v5,
error is emitted about unsupported ABI version.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D118949
Fixes: SWDEV-321313
This patch adds a new tool chain, HIPSPVToolChain, for emitting HIP
device code as SPIR-V binary. The SPIR-V binary is emitted by using an
external tool, SPIRV-LLVM-Translator, temporarily. We intend to switch
the translator to the llc tool when the SPIR-V backend lands on LLVM
and proves to work well on HIP implementations which consume SPIR-V.
Before the SPIR-V emission the tool chain loads an optional external
pass plugin, either automatically from a HIP installation or from a
path pointed by --hipspv-pass-plugin, and runs passes that are meant
to expand/lower HIP features that do not have direct counterpart in
SPIR-V (e.g. dynamic shared memory).
Code emission for SPIR-V will be enabled and HIPSPVToolChain tests
will be added in the follow up patch part 3.
Other changes: New option ‘-nohipwrapperinc’ is added to exclude HIP
include wrappers. The reason for the addition is that they cause
compile errors when compiling HIP sources for the host side for HIPCL
and HIPLZ implementations. New option is added to avoid this issue.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D110618
Math libraries are linked only when -lm is specified. This is because
host system could be missing rocm-device-libs.
Reviewed By: JonChesterfield, yaxunl
Differential Revision: https://reviews.llvm.org/D105981