On GFX12+, GLOBAL_INV increments the loadcnt counter but does not write results to any VGPRs. Previously, we unconditionally inserted s_wait_loadcnt 0 at function returns even when the only pending loadcnt was from GLOBAL_INV instructions. This patch optimizes waitcnt insertion by skipping the loadcnt wait at function boundaries when no VGPRs have pending loads. This is determined by checking if any VGPR has a score greater than the lower bound for LOAD_CNT - if not, the pending loadcnt must be from non-VGPR-writing instructions like GLOBAL_INV. The optimization is limited to GFX12+ targets where GLOBAL_INV exists and uses the extended wait count instructions. This is a follow-up optimization to PR #135340 which added tracking for GLOBAL_INV in the waitcnt pass.
The LLVM Compiler Infrastructure ================================ This directory and its subdirectories contain source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and runtime environments. LLVM is open source software. You may freely distribute it under the terms of the license agreement found in LICENSE.txt. Please see the documentation provided in docs/ for further assistance with LLVM, and in particular docs/GettingStarted.rst for getting started with LLVM and docs/README.txt for an overview of LLVM's documentation setup. If you are writing a package for LLVM, see docs/Packaging.rst for our suggestions.