llvm-project

Author	SHA1	Message	Date
Yaxun (Sam) Liu	33a6ce1837	[HIP] Allow partial linking for `-fgpu-rdc` (#81700 ) `-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `--emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: https://github.com/llvm/llvm-project/issues/77018	2024-02-22 13:51:31 -05:00
Yaxun (Sam) Liu	f2677afe91	[CUDA][HIP] Externalize device var in anonymous namespace Device variables in an anonymous namespace may be referenced by host code, therefore they need to be externalized in a similar way as a static device variables or kernels in an anonymous namespace. Fixes: https://github.com/ROCm-Developer-Tools/HIP/issues/3246 Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D152164	2023-06-06 12:03:48 -04:00
Nikita Popov	0419465fa4	[Clang] Update some CUDA tests to opaque pointers (NFC)	2022-12-13 11:50:08 +01:00
Nikita Popov	532dc62b90	[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC) This adds -no-opaque-pointers to clang tests whose output will change when opaque pointers are enabled by default. This is intended to be part of the migration approach described in https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9. The patch has been produced by replacing %clang_cc1 with %clang_cc1 -no-opaque-pointers for tests that fail with opaque pointers enabled. Worth noting that this doesn't cover all tests, there's a remaining ~40 tests not using %clang_cc1 that will need a followup change. Differential Revision: https://reviews.llvm.org/D123115	2022-04-07 12:09:47 +02:00
Michael Liao	4e5d9c8803	[Internalize] Preserve variables externally initialized. - ``externally_initialized`` variables would be initialized or modified elsewhere. Particularly, CUDA or HIP may have host code to initialize or modify ``externally_initialized`` device variables, which may not be explicitly referenced on the device side but may still be used through the host side interfaces. Not preserving them triggers the elimination of them in the GlobalDCE and breaks the user code. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D105135	2021-07-08 10:48:19 -04:00
Yaxun (Sam) Liu	4cb42564ec	[CUDA][HIP] Fix device variables used by host variables emitted on both host and device side with different addresses when ODR-used by host function should not cause device side counter-part to be force emitted. This fixes the regression caused by https://reviews.llvm.org/D102237 Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D102801	2021-05-20 17:04:29 -04:00
Yaxun (Sam) Liu	98575708da	[CUDA][HIP] Fix device template variables Currently clang does not emit device template variables instantiated only in host functions, however, nvcc is able to do that: https://godbolt.org/z/fneEfferY This patch fixes this issue by refactoring and extending the existing mechanism for emitting static device var ODR-used by host only. Basically clang records device variables ODR-used by host code and force them to be emitted in device compilation. The existing mechanism makes sure these device variables ODR-used by host code are added to llvm.compiler-used, therefore they are guaranteed not to be deleted. It also fixes non-ODR-use of static device variable by host code causing static device variable to be emitted and registered, which should not. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D102237	2021-05-12 11:13:29 -04:00
Yaxun (Sam) Liu	d5c0f00e21	[CUDA][HIP] Mark device var used by host only Add device variables to llvm.compiler.used if they are ODR-used by either host or device functions. This is necessary to prevent them from being eliminated by whole-program optimization where the compiler has no way to know a device variable is used by some host code. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D98814	2021-04-17 11:25:25 -04:00

8 Commits