153 Commits

Author SHA1 Message Date
Fraser Cormack
df74736732
[clang] Add the ability to link libclc OpenCL libraries (#146503)
This commit adds driver support for linking libclc OpenCL libraries. It
takes the form of a new optional flag: --libclc-lib=namespec. Nothing is
linked unless this flag is specified.

Not all libclc targets have corresponding clang targets. For this reason
it is desirable for users to be able to specify a libclc library name.
We support this by taking both a library name (without the .bc suffix)
or a filename. Both of these are searched for in the clang resource
directory. Filenames are
also checked themselves so that absolute paths can be provided. The
syntax for specifying filenames (as opposed to library names) uses a
leading colon (:), inspired by the -l option.

To accommodate this option, libclc libraries are now placed into clang's
resource directory in an in-tree configuration. The libraries are all
placed in <resource-dir>/lib/libclc and
are not grouped under host-specific directories as some other runtime
libraries are; it is not expected that OpenCL libraries will differ
depending on the host toolchain.

Currently only the AMDGPU toolchain supports this option as a proof of
concept. Other targets such as NVPTX or SPIR/SPIR-V could support it
too. We could optionally let target toolchains search for libclc
libraries themselves, possibly when passed an empty --libclc-lib.
2025-08-04 15:37:22 +01:00
Jakub Chlanda
073460a2a3
[HIP][Clang][Driver] Move BC preference logic into ROCm detection (#149294)
This patch provides a single point for handling the logic behind
choosing common bitcode libraries. The intention is that the users of
ROCm installation detector will not have to rewrite options handling
code each time the bitcode libraries are queried. This is not too
distant from detectors for other architecture that encapsulate the
similar decision making process, providing cleaner interface. The only
flag left in `getCommonBitcodeLibs` (main point of entry) is
`NeedsASanRT`, this is deliberate, as in order to calculate it we need
to consult `ToolChain`.
2025-07-23 09:45:11 +02:00
Joseph Huber
a7d93653a6
[Clang] Rework creating offloading toolchains (#125556)
Summary:
This patch reworks how we create offloading toolchains. Previously we
would handle this separately for all the different kinds. This patch
instead changes this to use the target triple and the offloading kind to
determine the proper toolchain. In the old case where the user only
passes `--offload-arch` we instead infer the triple from the passed
arguments. This is a pretty major overhaul but currently passes all the
clang tests with only minor changes to error messages.
2025-07-21 18:36:39 -05:00
Joseph Huber
ddb018f8d3
[Clang][NFC] Add alias target for amdgpu-arch-tool and nvptx-arch-tool (#147558)
Summary:
These commands both do the same thing and behave like the same tool.
Now, the `nvptx-arch` and `amdgpu-arch` tools cause it to only emit
architectures for that name.
2025-07-08 11:40:22 -05:00
Yaxun (Sam) Liu
44936c8d13
[CUDA][HIP] add options --[no-]offload-inc (#140106)
Currently there is only option -nogpuinc for disabling
the default CUDA/HIP wrapper headers. However, there
are situations where -nogpuinc needs to be overriden
for enabling CUDA/HIP wrapper headers. This patch
adds --[no-]offload-inc for that purpose. When both
exist, the last wins. -nogpuinc and -nocudainc are
now alias to --no-offload-inc.
2025-06-23 11:02:06 -04:00
Cameron McInally
a42bb8b57a
[Driver] Move CommonArgs to a location visible by the Frontend Drivers (#142800)
This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.

We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.

Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:

  `clang -cc1 -mprefer-vector-width=junk test.c'

With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.

---------

Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>
2025-06-06 17:59:24 -04:00
Kazu Hirata
6c37341943
[Driver] Remove unused includes (NFC) (#141448)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-26 09:13:36 -07:00
Brad Smith
dddcbc26d6
[Driver][LTO] Move common code for LTO to addLTOOptions() (#74178) 2025-05-23 23:03:37 -04:00
Kazu Hirata
9adcb4fe12
[clang] Use llvm::replace (NFC) (#140264) 2025-05-16 09:06:31 -07:00
Matt Arsenault
ccfb97b421 Revert "clang/AMDGPU: Stop looking for oclc_daz_opt_* control libraries (#134805)"
This reverts commit 028429ac452acde227ae0bfafbfe8579c127e1ea and
1004fae222efeee215780c4bb4e64eb82b07fb4f.

These really need to be part of the compiler distribution. Bots are
relying on a nearly year old version to provide bitcode.
2025-04-13 14:47:39 +02:00
Matt Arsenault
028429ac45
clang/AMDGPU: Stop looking for oclc_daz_opt_* control libraries (#134805)
These have been empty since July 2023
2025-04-13 09:48:21 +02:00
Matt Arsenault
54cdc75857
clang/AMDGPU: Stop looking for hip.bc in device libs (#134801)
This has been an empty library since January 2023
2025-04-09 16:39:50 +02:00
Shilei Tian
f19c6f23ab
[Clang][AMDGPU] Improve error message when device libraries for COV6 are missing (#134745)
#130963 switches the default to COV6, which requires ROCm 6.3.
Currently, if the
device libraries for COV6 are not found, the error message is not very
helpful.
This PR provides a more informative error message in such cases.
2025-04-08 09:57:43 -04:00
Harmen Stoppels
bd788dbf51
[AMDGPU] Remove detection of hip runtime for Spack (#133263)
There is special logic to detect the hip runtime when llvm is installed
with Spack. It works by matching the install prefix of llvm against
`llvm-amdgpu-*` followed by effectively globbing for

```
<llvm dir>/../hip-x.y.z-*/
```

and checking there is exactly one such directory.

I would suggest to remove autodetection for the following reasons:

1. In the Spack ecosystem it's by design that every package lives in
   its own prefix, and can only know where its dependencies are
   installed, it has no clue what its dependents are and where they are
installed. This heuristic detection breaks that invariant, since `hip`
   is a dependent of `llvm`, and can be surprising to Spack users.
2. The detection can lead to false positives, since users can be using
   an llvm installed "upstream" with their own build of hip locally, and
   they may not realize that clang is picking up upstream hip instead of
   their local copy.
3. It only works if the directory name is `llvm-amdgpu-*` which happens
   to be the name of AMD's fork of `llvm`, so it makes no sense that
   this code lives in the main LLVM repo for which the Spack package
   name is `llvm`. Feels wrong that LLVM knows about Spack package
   names, which can change over time.
4. Users can change the install directory structure, meaning that this
   detection is not robust under config changes in Spack.
2025-04-02 14:52:27 +07:00
Joseph Huber
e9d517d183
[Clang] Handle -flto-partitions generically and forward it properly (#133283)
Summary:
The https://github.com/llvm/llvm-project/pull/128509 patch introduced
`--flto-partitions`. This was marked as a HIP only argument, and was
also spelled and handled incorrectly for an `-f` option. This patch
makes the handling generic for `ld.lld` consumers.

This also fixes some issues with emitting the flags being put after the
default arguments, preventing users from overriding them. Also, forwards
things properly for the new driver so we can test this.
2025-03-27 14:31:35 -05:00
Joseph Huber
d0aa1f9c43
[Clang] Make --lto-partitions only default for HIP (#133164)
Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the `libc` project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.

I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the `--target=amdgcn-amd-amdhsa` for its output,
which means we'd need to forward the default somehow.
2025-03-27 07:34:57 -05:00
Pierre van Houtryve
ed022d93b2
[clang][AMDGPU] Enable module splitting by default (#128509)
The default number of partitions is the number of cores on the machine
with a cap at 16, as going above 16 is unlikely to be useful in the
common case.

Adds a flto-partitions option to override the number of partitions
easily (without having to use -Xoffload-linker). Setting it to 1
effectively disables module splitting.

Fixes SWDEV-506214
2025-03-24 14:43:08 +01:00
Joseph Huber
f6e3d33c00
[Clang][NFC] Introduce --offloadlib positive flag for nogpulib and alias to --no-offloadlib (#126567)
Summary:
We support `nogpulib` to disable implicit libraries. In the future we
will want to change the default linking of these libraries based on the
user language. This patch just introduces a positive variant so now we
can do `-nogpulib -gpulib` to disable it.

Later patch will make the default a variable in the ROCmToolChain
depending on the target languages.
2025-02-13 07:59:08 -06:00
Amit Kumar Pandey
46f1bab793
Reapply "[Driver][ROCm][OpenMP] Fix default ockl linking for OpenMP."… (#126671)
- This reverts commit
0c6c4a9993.
  - Add '-mcode-object-version=5' as to explicitly use code object
    version 5 to match with 'FAIL' diagnostic.
  - Add Requires directive to support lit test run on platforms
    registered with x86_64 and amdgpu.
2025-02-12 13:40:51 +05:30
Florian Mayer
0c6c4a9993
Revert "[Driver][ROCm][OpenMP] Fix default ockl linking for OpenMP." (#126628)
Reverts llvm/llvm-project#126186

This broke the sanitizer buildbot:
https://lab.llvm.org/buildbot/#/builders/55/builds/6846
2025-02-10 16:35:26 -08:00
Amit Kumar Pandey
c69be3fe4b
[Driver][ROCm][OpenMP] Fix default ockl linking for OpenMP. (#126186)
ASan gpu runtime (asanrtl.bc) linking is dependent on 'ockl.bc'. Link
'ockl.bc' only when ASan is enabled for openmp amdgpu offloading
application.
2025-02-10 21:41:49 +05:30
Joseph Huber
ff049e07a5
[AMDGPU] Do not enable GPU sanitizers by default (#126090)
Summary:
This probably wasn't the intended result, but the code here causes
OpenMP to always link in `ockl.bc` which was intentionally not linked.
This results in the OCKL definitions conflicting with the OpenMP ones
and also prevents them from being optimized out (Might be fixed with
newer ROCm that actually builds the visibility correctly).

I'm pretty sure the only reason this didn't break the tests is because
we're smart and pass `-nogpulib` there to keep the environment from
being poisoned with stuff like this.
2025-02-06 12:10:34 -06:00
Amit Kumar Pandey
b68b4f64a2
[Driver][ASan] Refactor Clang-Driver "Sanitizer Bitcode" linking. (#123922)
ASan bitcode linking is currently available for HIPAMD,OpenMP and
OpenCL. Moving sanitizer specific common parts of logic to appropriate
API's so as to reduce code redundancy and maintainability.
2025-01-30 16:28:03 +05:30
Joseph Huber
d4c4180417
[Clang] Add a flag to include GPU startup files (#112025)
Summary:
The C library for GPUs provides the ability to target regular C/C++
programs by providing the C library and a file containing kernels that
call the `main` function. This is mostly used for unit tests, this patch
provides a quick way to add them without needing to know the paths. I
currently do this explicitly, but according to the libc++ contributors
we don't want to need to specify these paths manually. See the
discussion in https://github.com/llvm/llvm-project/pull/104515.

I just default to `lib/` if the target-specific one isn't found because
the linker will handle giving a reasonable error message if it's not
found. Basically the use-case looks like this.

```console
$ clang test.c --target=amdgcn-amd-amdhsa -mcpu=native -startfiles -stdlib
$ amdhsa-loader a.out
PASS!
```
2024-10-28 07:17:19 -07:00
Joseph Huber
ba8c96593c
[Clang] Do not implicitly link C libraries for the GPU targets (#109052)
Summary:
I initially thought that it would be convenient to automatically link
these libraries like they are for standard C/C++ targets. However, this
created issues when trying to use C++ as a GPU target. This patch moves
the logic to now implicitly pass it as part of the offloading toolchain
instead, if found. This means that the user needs to set the target
toolchain for the link job for automatic detection, but can still be
done manually via `-Xoffload-linker -lc`.
2024-09-18 06:44:07 -07:00
Amit Kumar Pandey
5dd1c82778
[NFC][AMDGPU][Driver] Move 'shouldSkipSanitizeOption' utility to AMDGPU. (#107997)
HIPAMDToolChain and AMDGPUOpenMPToolChain both depends on the
"shouldSkipSanitizeOption" api to sanitize/not sanitize device code.
2024-09-10 19:11:51 +05:30
Joel E. Denny
1ea0865dd6
[Clang] Add env var for nvptx-arch/amdgpu-arch timeout (#102521)
When working on very busy systems, check-offload frequently fails many
tests with this diagnostic:

```
clang: error: cannot determine amdgcn architecture: /tmp/llvm/build/bin/amdgpu-arch: Child timed out: ; consider passing it via '-march'
```

This patch accepts the environment variable
`CLANG_TOOLCHAIN_PROGRAM_TIMEOUT` to set the timeout. It also increases
the timeout from 10 to 60 seconds.
2024-08-09 13:39:29 -04:00
Joseph Huber
2c0ec01b51
[AMDGPU] Correctly pass the target-id to ld.lld (#101037)
Summary:
The `ld.lld` linker handles LTO, but it does not understand the
target-id syntax some AMDGPU targets use. This patch parses the
target-id and passes the processor name in `-mcpu` and features in
`-mattr`.
2024-07-29 12:09:37 -05:00
Joseph Huber
50835f0a36
[Clang] Do not pass -shared when using -r for AMDGPU (#100760)
Summary:
We can use `-r` and `--lto-emit-llvm` to get the LTO pass to optimize +
link LLVM-IR. Currently this doesn't work on AMDGPU because it always
passes `-shared` which is incompatible with `-r`. Fix that so we can use
it.
2024-07-26 10:25:28 -05:00
Joseph Huber
4f516aa04b
[Clang] Make the GPU toolchains implicitly link -lm and -lc (#98170)
Summary:
The previous patches (The other commits in this chain) allow the
offloading toolchain to directly invoke the device linker. Because of
this, we can now just have the toolchain implicitly include `-lc` and
`-lm` like a standard target does. This removes the old handling that
went through the fat binary `-lcgpu`.
2024-07-23 18:30:30 -05:00
Yaxun (Sam) Liu
60fa7c7690
Enable ASAN in amdgpu toolchain for OpenCL (#96262) 2024-06-21 16:12:56 -04:00
Joseph Huber
374f6554c3
[OpenMP] Fix passing target id features to AMDGPU offloading (#94765)
Summary:
AMDGPU supports a `target-id` feature which is used to qualify targets
with different incompatible features. These are both rules and target
features. Currently, we pass `-target-cpu` twice when offloading to
OpenMP, and do not pass the target-id features at all. The effect was
that passing something like `--offload-arch=gfx90a:xnack+` would show up
as `-target-cpu=gfx90a:xnack+ -target-cpu=gfx90a`. Thus ignoring the
xnack completely and passing it twice. This patch fixes that to pass it
once and then separate it like how HIP does.
2024-06-07 11:14:16 -05:00
Joseph Huber
2981f3a284
[Clang] Add timeout for GPU detection utilities (#94751)
Summary:
The utilities `nvptx-arch` and `amdgpu-arch` are used to support
`--offload-arch=native` among other utilities in clang. However, these
rely on the GPU drivers to query the features. In certain cases these
drivers can become locked up, which will lead to indefinate hangs on any
compiler jobs running in the meantime.

This patch adds a ten second timeout period for these utilities before
it kills the job and errors out.
2024-06-07 08:45:35 -05:00
Kazu Hirata
135d92f903
[Driver] Use StringRef::operator== instead of StringRef::equals (NFC) (#91698)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  13 under clang/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-09 23:12:08 -07:00
Joseph Huber
62549dbbf2
[AMDGPU] Correctly determine the toolchain linker (#89803)
Summary:
The AMDGPU toolchain simply took the short name to get the link job
instead of using the common utilities that respect options like
`-fuse-ld`. Any linker that isn't `ld.lld` will fail, however we should
be able to override it.
2024-04-24 06:46:31 -05:00
Jun Wang
86842e1f72
[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236)
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-04-10 10:47:04 -07:00
Fangrui Song
2b5cd8be3a [Driver] Remove InstallDir and getInstalledDir. NFC
Follow-up to #80527.
2024-03-03 18:10:46 -08:00
Joseph Huber
99660082cb
[Clang] Append target search paths for direct offloading compilation (#82699)
Summary:
Recent changes to the `libc` project caused the headers to be installed
to `include/<triple>` for the GPU and the libraries to be in
`lib/<triple>`. This means we should automatically append these search
paths so they can be found by default. This allows the following to work
targeting AMDGPU.

```shell
$ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
$ amdhsa-loader a.out
```
2024-02-23 14:21:02 -06:00
Yaxun (Sam) Liu
46b6756255
[AMDGPU] Diagnose unaligned atomic (#80322)
AMDGPU does not support unaligned atomics, therefore make the warning an
error.

This patch is transferred from

https://reviews.llvm.org/D99201
2024-02-02 10:41:47 -05:00
Yaxun (Sam) Liu
fcd3752342
[HIP] fix HIP detection for /usr (#80190)
Skip checking HIP version file under parent directory for /usr/local
since /usr will be checked after /usr/local.

Fixes: https://github.com/llvm/llvm-project/issues/78344
2024-02-01 10:33:51 -05:00
Alex Voicu
907f2a0927
[HIP][Driver] Automatically include hipstdpar forwarding header (#78915)
The forwarding header used by `hipstdpar` on AMDGPU targets is now
pacakged with `rocThrust`. This change augments the ROCm Driver
component so that it can automatically pick up the packaged header iff
the user hasn't overridden it via the dedicated flag.
2024-01-23 00:55:59 +00:00
Kazu Hirata
f3dcc2351c
[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
Joseph Huber
5513d58ad5
[OpenMP][AMDGPU] Do not include 'ockl' implementations in OpenMP (#70462)
Summary:
The 'ockl' bitcode library from the ROCm device library contains several
implementations of functions like `printf` and `malloc`. We currently do
not depend on these in the OpenMP toolchain, so we shouldn't be linking
them. The primary motivation behind this change is the library rewriting
calls to `printf` and pulling in other unused 'hostcall' routines.
2023-10-27 14:56:29 -05:00
Alex Voicu
9a408588d1 [HIP][Clang][Driver] Add Driver support for hipstdpar
This patch adds the Driver changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What this change does can be summed up as follows:

- add two flags, one for enabling `hipstdpar` compilation, the second enabling the optional allocation interposition mode;
- the flags correspond to new LangOpt members;
- if we are compiling or linking with --hipstdpar, we enable HIP; in the compilation case C and C++ inputs are treated as HIP inputs;
- the ROCm / AMDGPU driver is augmented to look for and include an implementation detail forwarding header; we error out if the user requested `hipstdpar` but the header or its dependencies cannot be found.

Tests for the behaviour described above are also added.

Reviewed by: MaskRay, yaxunl

Differential Revision: https://reviews.llvm.org/D155775
2023-10-03 13:14:46 +01:00
Jacob Lambert
0661533e41 [AMDGPU] Prepend --no-undefined option for linker instead of append
Previously, for linking in amdgpu contexts, the --no-undefined was appended to the options passed to lld,
overriding any user-supplied options via "-Wl," or "-Xlinker". We now prepend --no-undefined so that
the user options are respected.

Differential Revision: https://reviews.llvm.org/D158582
2023-08-23 12:25:01 -07:00
Yaxun (Sam) Liu
91b9bdeb92 [AMDGPU] Support -mcpu=native for OpenCL
When -mcpu=native is specified, try detecting GPU
on the system by using amdgpu-arch tool. If it
fails to detect GPU, emit an error about GPU
not detected. If multiple GPUs are detected,
use the first GPU and emit a warning.

Reviewed by: Matt Arsenault, Fangrui Song

Differential Revision: https://reviews.llvm.org/D154531
2023-07-13 16:21:35 -04:00
Fangrui Song
681cb54a54 [Driver] Fix duplicate -L after D150013
D150013 is to render -L for AMDGPU but updating tools::AddLinkerInputs is wrong
and causes many non-isCrossCompiling targets to have duplicate -L options
because they do `Args.AddAllArgs(CmdArgs, options::OPT_L);`.

Revert the change and add a `Args.AddAllArgs(CmdArgs, options::OPT_L);` instead.
2023-07-07 23:57:45 -07:00
Yaxun (Sam) Liu
41a1625e07 [HIP] Fix version detection for old HIP-PATH
ROCm used to install components under individual directories,
e.g. HIP installed to /opt/rocm/hip and rocblas installed to
/opt/rocm/rocblas. ROCm has transitioned to a flat directory
structure where all components are installed to /opt/rocm.
HIP-PATH and --hip-path are supposed to be /opt/rocm as
clang detect HIP version by /opt/rocm/share/hip/version.
However, some existing HIP app still uses HIP-PATH=/opt/rocm/hip.
To avoid regression, clang will also try detect share/hip/version
under the parent directory of HIP-PATH or --hip-path.
This way, the detection will work for both new HIP-PATH and
old HIP-PATH.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D154077

Fixes: SWDEV-407757
2023-06-29 14:57:26 -04:00
Joseph Huber
765301183f [AMDGPU] Always pass -mcpu to the lld linker
Currently, AMDGPU more or less only supports linking with LTO. If the
user does not either pass `-flto` or `-Wl,-plugin-opt=mcpu=` manually
linking will fail because the architecture's aren't compatible. THis
patch simply passes `-mcpu` by default if it was specified. Should be a
no-op if it's not actually used.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D153909
2023-06-28 08:52:37 -05:00
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00