llvm-project

Author	SHA1	Message	Date
Joseph Huber	ffd6a13b5f	[compiler-rt] Rework profile data handling for GPU targets (#187136 ) Summary: Currently, the GPU iterates through all of the present symbols and copies them by prefix. This is inefficient as it requires a lot of small high-latency data transfers rather than a few large ones. Additionally, we force every single profiling symbol to have protected visibility. This means potentially hundreds of unnecessary symbols in the symbol table. This PR changes the interface to move towards the start / stop section handling. AMDGPU supports this natively as an ELF target, so we need little changes. Instead of overriding visibility, we use a single table to define the bounds that we can obtain with one contiguous load. Using a table interface should also work for the in-progress HIP implementation for this, as it wraps the start / stop sections into standard void pointers which will be inside of an already mapped region of memory, so they should be accessible from the HIP API. NVPTX is more difficult as it is an ELF platform without this support. I have hooked up the 'Other' handling to work around this, but even then it's a bit of a stretch. I could remove this support here, but I wanted to demonstrate that we can share the ABI. However, NVPTX will only work if we force LTO and change the backend to emit variables in the same TL;DR, we now do this: ```c struct { start1, stop1, start2, stop2, start3, stop3, version; } device; struct host = DtoH(lookup("device")); counters = DtoH(host.stop - host.start) version = DtoH(host.version); ```	2026-03-26 10:17:43 -05:00
Ross Brunton	abb878438a	[Offload] Allow querying the size of globals (#147698 ) The `GlobalTy` helper has been extended to make both the Size and Ptr be optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the size of the global to the struct, instead of just verifying it.	2025-07-10 12:05:31 +01:00
Ethan Luis McDonough	67ff66e677	[PGO][Offload] Fix offload coverage mapping (#143490 ) This pull request fixes coverage mapping on GPU targets. - It adds an address space cast to the coverage mapping generation pass. - It reads the profiled function names from the ELF directly. Reading it from public globals was causing issues in cases where multiple device-code object files are linked together.	2025-06-10 20:19:38 -05:00
Ethan Luis McDonough	c50d39f073	[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268 ) This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden	2025-03-19 19:01:38 -05:00
Ethan Luis McDonough	9e5c136d5a	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365 ) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set	2025-02-11 23:30:54 -06:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
Johannes Doerfert	330d8983d2	[Offload] Move `/openmp/libomptarget` to `/offload` (#75125 ) In a nutshell, this moves our libomptarget code to populate the offload subproject. With this commit, users need to enable the new LLVM/Offload subproject as a runtime in their cmake configuration. No further changes are expected for downstream code. Tests and other components still depend on OpenMP and have also not been renamed. The results below are for a build in which OpenMP and Offload are enabled runtimes. In addition to the pure `git mv`, we needed to adjust some CMake files. Nothing is intended to change semantics. ``` ninja check-offload ``` Works with the X86 and AMDGPU offload tests ``` ninja check-openmp ``` Still works but doesn't build offload tests anymore. ``` ls install/lib ``` Shows all expected libraries, incl. - `libomptarget.devicertl.a` - `libomptarget-nvptx-sm_90.bc` - `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git` - `libomptarget.so` -> `libomptarget.so.18git` Fixes: https://github.com/llvm/llvm-project/issues/75124 --------- Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>	2024-04-22 09:51:33 -07:00

10 Commits