llvm-project

Author	SHA1	Message	Date
Jefferson Le Quellec	2921a0928c	Make the argument -Xcuda-ptxas visible to the driver in cl-mode It has been noticed that the arguments are being passed twice to ptxas. This also has been fixed by filtering out the arguments before appending them to the new DAL created by CudaToolChain::TranslateArgs. github:https://github.com/llvm/llvm-project/pull/86807	2024-04-08 14:11:43 +01:00
Yichen Yan	047b2b241d	[NVPTX] Add `-march=general` option to mirror default configuration (#85222 ) This PR adds `-march=generic` support for the NVPTX backend. This fulfills a TODO introduced in #79873. With this PR, users can explicitly request the "default" CUDA architecture, which makes sure that no specific architecture is specified. This PR does not address any compatibility issues between different CUDA versions. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-03-15 17:16:10 -05:00
Joseph Huber	3a56b5a27d	[CUDA] Include PTX in non-RDC mode using the new driver (#84367 ) Summary: The old driver embed PTX in rdc-mode and so does the `nvcc` compiler. The new drivers currently does not do this, so we should keep it consistent in this case. This simply requires adding the assembler output as an input to the offloading action that gets fed to fatbin.	2024-03-07 16:53:41 -06:00
Joseph Huber	1977404d20	[OpenMP] Respect LLVM per-target install directories (#83282 ) Summary: One recurring problem we have with the OpenMP libraries is that they are potentially conflicting with ones found on the system, this occurs when there are two copies and one is used for linking that it not attached to the correspoding clang compiler. LLVM already uses target specific directories for this, like with libc++, which are always searched first. This patch changes the install directory to be `lib/x86_64-unknown-linux-gnu` for example. Notable changes would be that users will need to change their LD_LIBRARY_PATH settings optionally, or use default rt-rpath options. This should fix problems were users are linking the wrong versions of static libraries	2024-02-28 15:39:27 -06:00
Joseph Huber	99660082cb	[Clang] Append target search paths for direct offloading compilation (#82699 ) Summary: Recent changes to the `libc` project caused the headers to be installed to `include/<triple>` for the GPU and the libraries to be in `lib/<triple>`. This means we should automatically append these search paths so they can be found by default. This allows the following to work targeting AMDGPU. ```shell $ clang foo.c -flto -mcpu=native --target=amdgcn-amd-amdhsa -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o $ amdhsa-loader a.out ```	2024-02-23 14:21:02 -06:00
Joseph Huber	d5a15f3116	[Clang][NVPTX] Allow passing arguments to the linker while standalone (#73030 ) Summary: We support standalone compilation for the NVPTX architecture using 'nvlink' as our linker. Because of the special handling required to transform input files to cubins, as nvlink expects for some reason, we didn't use the standard AddLinkerInput method. However, this also meant that we weren't forwarding options passed with -Wl to the linker. Add this support in for the standalone toolchain path. Revived from https://reviews.llvm.org/D149978	2024-02-22 16:27:53 -06:00
Joseph Huber	7155c1ef65	[NVPTX] Allow compiling LLVM-IR without `-march` set (#79873 ) Summary: The NVPTX tools require an architecture to be used, however if we are creating generic LLVM-IR we should be able to leave it unspecified. This will result in the `target-cpu` attributes not being set on the functions so it can be changed when linked into code. This allows the standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR simmilar to how CUDA's deviceRTL looks from C/C++	2024-01-30 21:44:43 -06:00
Joseph Huber	82d335e70f	[NVPTX] Add support for -march=native in standalone NVPTX (#79373 ) Summary: We support `--target=nvptx64-nvidia-cuda` as a way to target the NVPTX architecture from standard CPU. This patch simply uses the existing support for handling `--offload-arch=native` to also apply to the standalone toolchain.	2024-01-25 15:56:13 -06:00
Kazu Hirata	10886a8f0a	[Driver] Use SmallString::operator std::string (NFC)	2024-01-19 22:24:09 -08:00
Kazu Hirata	f3dcc2351c	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 08:54:13 -08:00
Artem Belevich	631c6e834c	[CUDA] Add support for CUDA-12.3 and sm_90a (#74895 )	2023-12-11 12:18:28 -08:00
Brad Smith	8a4b9e9965	[Driver] Move assertion check before checking Output.isFilename (#67210 )	2023-09-25 20:19:25 -04:00
Takuya Shimizu	01b88dd66d	[NFC] Remove unused variables declared in conditions D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch. Differential Revision: https://reviews.llvm.org/D158016	2023-08-30 10:05:06 +09:00
Yaxun (Sam) Liu	e17882430e	[CUDA][HIP] Rename and fix `-fcuda-approx-transcendentals` Rename -fcuda-approx-transcendentals as -fgpu-approx-transcendentals and pass it to both device and host clang -cc1. Fix its interaction with -ffast-math to allow -fno-gpu-approx-transcendentals to override the implicit -fcuda-approx-transcendentals due to -ffast-math. Rename the predefined macro to be __CLANG_GPU_APPROX_TRANSCENDENTALS__. Emit the macro for both device and host compilation. Reviewed by: Artem Belevich, Fangrui Song Differential Revision: https://reviews.llvm.org/D154797	2023-07-25 12:01:41 -04:00
Joseph Huber	d2ac0069a2	[Clang] Only emit CUDA version warnings when creating the CUDA toolchain This warning primarily applies to users of the CUDA langues as there may be new features we rely on. The other two users of the toolchain are OpenMP via `-fopenmp --offload-arch=sm_70` and a cross-compiled build via `--target=nvptx64-nvida-cuda -march=sm_70`. Both of these do not rely directly on things that would change significantly between CUDA versions, and the way they are built can sometims make this warning print many times. This patch changees the behaiour to only check for the version when building for CUDA offloading specifically, the other two will not have this check. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D155606	2023-07-18 13:48:11 -05:00
Artem Belevich	ffb635cb2d	[CUDA] bump supported CUDA version to 12.1/11.8 Differential Revision: https://reviews.llvm.org/D151361	2023-05-25 11:57:55 -07:00
Artem Belevich	a825f3754b	[CUDA] Relax restrictions on GPU-side variadic functions Allow parsing GPU-side variadic functions when we're compiling with CUDA-9 or newer. We still do not allow accessing variadic arguments. CUDA-9 was the version which introduced PTX-6.0 which allows implementing variadic functions, so older versions can't have variadics in principle. This is required for dealing with headers in recent CUDA versions that rely on variadic function declarations in some of the templated code in libcu++. E.g. https://github.com/llvm/llvm-project/issues/58410 Differential Revision: https://reviews.llvm.org/D150718	2023-05-17 12:51:01 -07:00
Joseph Huber	c2c917f7f6	[Clang] Change default triple to LLVM_HOST_TRIPLE for the CUDA toolchain When cross-compiling NVPTX we use the triple to indicate which paths to search for the CUDA toolchain. Currently this uses the default target triple. This might not be exactly correct, as this is the default triple used to compile binaries, not the host system. We want the host triple because it indicates which folders should hold CUDA. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D150136	2023-05-08 15:54:50 -05:00
Joseph Huber	f05ce9045a	[NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors This patch mostly adapts the existing AMDGPUCtorDtorLoweringPass for use by the Nvidia backend. This pass transforms the ctor / dtor list into a kernel call that can be used to invoke those functinos. Furthermore, we emit globals such that the names and addresses of these constructor functions can be found by the driver. Unfortunately, since NVPTX has no way to emit variables at a named section, nor a functioning linker to provide the begin / end symbols, we need to mangle these names and have an external application find them. This work is related to the work in D149398 and D149340. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D149451	2023-05-04 07:13:00 -05:00
Kiran Chandramohan	ab49747f9d	[NFC][Clang] Move DebugOptions to llvm/Frontend for reuse in Flang This patch moves the Debug Options to llvm/Frontend so that it can be shared by Flang as well. Reviewed By: kiranchandramohan, awarzynski Differential Revision: https://reviews.llvm.org/D142347	2023-03-29 12:01:54 +00:00
Joseph Huber	861764b1c5	[NVPTX] Fix NVPTX output name in the driver with `-save-temps` Summary: Currently, OpenMP and direct compilation uses an NVPTX toolchain to directly invoke the CUDA tools from Clang to do the assembling and linking of NVPTX codes. This breaks under `-save-temps` because of a workaround. The `nvlink` linker does not accept `.o` files, so we need to be selective when we output these. The previous logic keyed off of presense in the temp files and wasn't a great solution. Change this to just query the input args for `-c` to see if we stop at the assembler. Fixes https://github.com/llvm/llvm-project/issues/60767	2023-02-15 07:39:59 -06:00
Archibald Elliott	d768bf994f	[NFC][TargetParser] Replace uses of llvm/Support/Host.h The forwarding header is left in place because of its use in `polly/lib/External/isl/interface/extract_interface.cc`, but I have added a GCC warning about the fact it is deprecated, because it is used in `isl` from where it is included by Polly.	2023-02-10 09:59:46 +00:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Joseph Huber	db202286eb	[Clang][NFC] Fix out-of-date comments on 'clang-offload-bundler' Summary: These comments are confusing as the `clang-offload-bundler` is no longer used by these toolchains.	2023-01-26 13:03:01 -06:00
Joseph Huber	11656f204a	[CUDA] Fix output from `ptxas` being removes as a temporary file Summary: The logic here is to add the `.cubin` temporary file if we had to create a new filename to handle it. Unfortuantely the logic was wrong because we compare `const char *` values here. This logic seems to have been wrong for some time, but was never noticed since we never used the relocatable output. Fixes https://github.com/llvm/llvm-project/issues/60301	2023-01-25 16:24:30 -06:00
Joseph Huber	0660397e68	[CUDA] Allow targeting NVPTX directly without a host toolchain Currently, the NVPTX compilation toolchain can only be invoked either through CUDA or OpenMP via `--offload-device-only`. This is because we cannot build a CUDA toolchain without an accompanying host toolchain for the offloading. When using `--target=nvptx64-nvidia-cuda` this results in generating calls to the GNU assembler and linker, leading to errors. This patch abstracts the portions of the CUDA toolchain that are independent of the host toolchain or offloading kind into a new base class called `NVPTXToolChain`. We still need to read the host's triple to build the CUDA installation, so if not present we just assume it will match the host's system for now, or the user can provide the path explicitly. This should allow the compiler driver to create NVPTX device images directly from C/C++ code. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D140158	2023-01-18 18:18:25 -06:00
Joseph Huber	debfa43117	[Clang][NFC] Clang-format CUDA toolchain file Summary: This file is not formatted, which makes further changes to it more difficult. Format it.	2023-01-18 17:15:03 -06:00
Joseph Huber	52b9a39742	[OpenMP] Make `-fopenmp-target=` use the `nvptx-arch` tool Previously, if the user did not provide an architecture when using `-fopenmp-targets=nvptx64` we used the value from `CLANG_OPENMP_DEFAULT_NVPTX_ARCH` which is defined at compile time. This isn't ideal because it means that the default is set when the LLVM compiler it built. Instead this patch uses the `nvptx-arch` tool to query it at runtime. This matches the existing behaviour of the AMDGPU toolchain with its `amdgpu-arch` tool. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D141708	2023-01-13 16:52:06 -06:00
Joseph Huber	26d62674cf	[Clang] Explicitly move returned values converted to expected Summary: These can cause failures on GCC-7 it seems. We should explicitly move them to prevent this from causing build failures.	2023-01-12 14:38:03 -06:00
Kadir Cetinkaya	4921b0a285	[clang][Driver][CUDA] Get rid of unused LibPath LibPath discovered during InstallationDetection wasn't used anywhere. Moreover it actually resulted in discarding installations that don't have any `/lib` directory. This is causing troubles for our pipelines downstream, that want to perform syntax-only analysis on the sources. Differential Revision: https://reviews.llvm.org/D141467	2023-01-12 10:36:43 +01:00
Joseph Huber	56ebfca4bc	[CUDA][HIP] Add support for `--offload-arch=native` to CUDA and refactor This patch adds basic support for `--offload-arch=native` to CUDA. This is done using the `nvptx-arch` tool that was introduced previously. Some of the logic for handling executing these tools was factored into a common helper as well. This patch does not add support for OpenMP or the "new" driver. That will be done later. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D141051	2023-01-11 10:30:30 -06:00
Kazu Hirata	6eb0b0a045	Don't include Optional.h These files no longer use llvm::Optional.	2022-12-14 21:16:22 -08:00
Kazu Hirata	22731dbd75	[clang] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 20:31:05 -08:00
Fangrui Song	2c5d49cffc	[Driver] llvm::Optional => std::optional and change a few referenced Basic and llvm/lib/WindowsDriver API	2022-12-03 20:17:05 +00:00
Artem Belevich	9a01cca660	Add support for CUDA-11.8 and sm_{87,89,90} GPUs. Differential Revision: https://reviews.llvm.org/D135306	2022-10-07 13:59:28 -07:00
Joseph Huber	937aaead87	[CUDA] Fix arguments after removing unused private variable Summary: A previous patch removed the use of the `OK` private variable in CUDA which resulted in usused variable warnings. this was fixed in f886f7e8ef7aa4f54298db792a656373af90440c but did not change the constructor to accurately represent its removal. This patch removes it from the interface entirely.	2022-08-26 15:28:34 -05:00
Sterling Augustine	f886f7e8ef	Remove unused private variable.	2022-08-26 12:43:05 -07:00
Joseph Huber	47166968db	[OpenMP] Deprecate the old driver for OpenMP offloading Recently OpenMP has transitioned to using the "new" driver which primarily merges the device and host linking phases into a single wrapper that handles both at the same time. This replaced a few tools that were only used for OpenMP offloading, such as the `clang-offload-wrapper` and `clang-nvlink-wrapper`. The new driver carries some marked benefits compared to the old driver that is now being deprecated. Things like device-side LTO, static library support, and more compatible tooling. As such, we should be able to completely deprecate the old driver, at least for OpenMP. The old driver support will still exist for CUDA and HIP, although both of these can currently be compiled on Linux with `--offload-new-driver` to use the new method. Note that this does not deprecate the `clang-offload-bundler`, although it is unused by OpenMP now, it is still used by the HIP toolchain both as their device binary format and object format. When I proposed deprecating this code I heard some vendors voice concernes about needing to update their code in their fork. They should be able to just revert this commit if it lands. Reviewed By: jdoerfert, MaskRay, ye-luo Differential Revision: https://reviews.llvm.org/D130020	2022-08-26 13:47:09 -05:00
John Ericson	3adda398ce	[clang][lldb][cmake] Use new `_INSTALL_LIBDIR_BASENAME` CPP macro Use this instead of `_LIBDIR_SUFFIX`, from which it is computed. This gets us ready for D130586, in which `*_LIBDIR_SUFFIX` is deprecated. Differential Revision: https://reviews.llvm.org/D132300	2022-08-20 12:52:21 -04:00
John Ericson	e941b031d3	Revert "[cmake] Use `CMAKE_INSTALL_LIBDIR` too" This reverts commit f7a33090a91015836497c75f173775392ab0304d. Unfortunately this causes a number of failures that didn't show up in my local build.	2022-08-18 22:46:32 -04:00
John Ericson	f7a33090a9	[cmake] Use `CMAKE_INSTALL_LIBDIR` too We held off on this before as `LLVM_LIBDIR_SUFFIX` conflicted with it. Now we return this. `LLVM_LIBDIR_SUFFIX` is kept as a deprecated way to set `CMAKE_INSTALL_LIBDIR`. The other `*_LIBDIR_SUFFIX` are just removed entirely. I imagine this is too potentially-breaking to make LLVM 15. That's fine. I have a more minimal version of this in the disto (NixOS) patches for LLVM 15 (like previous versions). This more expansive version I will test harder after the release is cut. Reviewed By: sebastian-ne, ldionne, #libc, #libc_abi Differential Revision: https://reviews.llvm.org/D130586	2022-08-18 15:33:35 -04:00
Joseph Huber	3b52341116	[CUDA] Fix output name being replaced in device-only mode When performing device only compilation, there was an issue where `cubin` outputs were being renamed to `cubin` despite the user's name. This is required in a normal compilation flow as the Nvidia tools only understand specific filenames instead of checking magic bytes for some unknown reason. We do not want to perform this transformation when the user is performing device only compilation. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D131278	2022-08-05 19:08:41 -04:00
Joseph Huber	56ab966a04	[CUDA] Stop adding CUDA features twice We currently call the `addNVPTXFeatures` function in two places, inside of the CUDA Toolchain and inside of Clang in the standard entry point. We normally add features to the job in Clang, so the call inside of the CUDA toolchain is redundant and results in `+ptx` features being added. Since we remove this call, we no longer will have a cached CUDA installation so we will usually create it twice. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D128752	2022-06-29 09:34:09 -04:00
Joseph Huber	4d3c010f1d	[CUDA] Do not embed a fatbinary when using the new driver Previously, when using the new driver we created a fatbinary with the PTX and Cubin output. This was mainly done in an attempt to create some backwards compatibility with the existing CUDA support that embeds the fatbinary in each TU. This will most likely be more work than necessary to actually implement. The linker wrapper cannot do anything with these embedded PTX files because we do not know how to link them, and if we did want to include multiple files it should go through the `clang-offload-packager` instead. Also this didn't repsect the setting that disables embedding PTX (although it wasn't used anyway). Reviewed By: tra Differential Revision: https://reviews.llvm.org/D128441	2022-06-23 15:40:43 -04:00
Joseph Huber	4205f4aba4	[Cuda] Add the features using the last argument Summary: We should use the last argument so this flag can be overridden properly.	2022-05-13 18:05:02 -04:00
Joseph Huber	7dc23abbd3	[CUDA] Add a flag to manually specify the target feature to use with CUDA Summary: Normally we parse through the CUDA installation to disover the needed features. However, we may want to build libraries on targets that do not currently have CUDA installed but still need to know which features to make use of when creating the PTX or bitcode. This flag is a simple way to specify this so we can compile certain codes withotu a valid CUDA installation. Ideally this could be done via an -Xarch or simimlar flag but currently they cannot handle this. We would need to support using an -Xarch flag that takes multiple arguments that then pass them to the -Xclang functionality.	2022-05-13 16:30:58 -04:00
Joseph Huber	8477a0d769	[OpenMP] Allow compiling multiple target architectures with OpenMP This patch adds support for OpenMP to use the `--offload-arch` and `--no-offload-arch` options. Traditionally, OpenMP has only supported compiling for a single architecture via the `-Xopenmp-target` option. Now we can pass in a bound architecture and use that if given, otherwise we default to the value of the `-march` option as before. Note that this only applies the basic support, the OpenMP target runtime does not yet know how to choose between multiple architectures. Additionally other parts of the offloading toolchain (e.g. LTO) require the `-march` option, these should be worked out later. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D124721	2022-05-06 16:57:16 -04:00
Shilei Tian	20a9fb953e	[Clang][OpenMP] Fix the issue that temp cubin files are not removed after compilation when using new OpenMP driver The root cause of this is, in `NVPTX::Assembler::ConstructJob`, the output file name might not match the `Output`'s file name passed into the function because `CudaToolChain::getInputFilename` is a specialized version. That means the real output file is not added to the temp files list, which will be all removed in the d'tor of `Compilation`. In order to "fix" it, in the function `NVPTX::OpenMPLinker::ConstructJob`, before calling `clang-nvlink-wrapper`, the function calls `getToolChain().getInputFilename(II)` to get the right output file name for each input, and add it to temp file, and then they can be removed w/o any issue. However, this whole logic doesn't work when using the new OpenMP driver because `NVPTX::OpenMPLinker::ConstructJob` is not called at all, which causing the issue that the cubin file generated in each single unit compilation is out of track. In this patch, we add the real output file into temp files if its name doesn't match `Output`. We add it when the file is an output instead of doing it when it is an input, like what we did in `NVPTX::OpenMPLinker::ConstructJob`, which makes more sense. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D124253	2022-04-22 18:07:28 -04:00
Joseph Huber	d912232741	[CUDA][FIX] Fix name conflict in getNVPTXTargetFeatures Summary: There was a naming conflict in the getNVPTXTargetFeatures function that prevented some compilers from correctly disambiguating between the enumeration and variable of the same name. Rename the variable to avoid this.	2022-03-23 23:07:51 -04:00
Joseph Huber	a3248e4b28	[CUDA] Add getTargetFeatures for the NVPTX toolchain The NVPTX toolchain uses target features to determine the PTX version to use. However this isn't exposed externally like most other toolchain specific target features are. Add this functionaliy in preparation for using it in for OpenMP offloading. Reviewed By: jdoerfert, tra Differential Revision: https://reviews.llvm.org/D122089	2022-03-21 16:32:36 -04:00

1 2 3 4

166 Commits