Pankaj Dwivedi 28d4e33b65
[AMDGPU][SIInsertWaitCnt] Optimize loadcnt insertion at function boundaries (#169647)
On GFX12+, GLOBAL_INV increments the loadcnt counter but does not write
results to any VGPRs. Previously, we unconditionally inserted
s_wait_loadcnt 0 at function returns even when the only pending loadcnt
was from GLOBAL_INV instructions.

This patch optimizes waitcnt insertion by skipping the loadcnt wait at
function boundaries when no VGPRs have pending loads. This is determined
by checking if any VGPR has a score greater than the lower bound for
LOAD_CNT - if not, the pending loadcnt must be from non-VGPR-writing
instructions like GLOBAL_INV.

The optimization is limited to GFX12+ targets where GLOBAL_INV exists
and uses the extended wait count instructions.

This is a follow-up optimization to PR #135340 which added tracking for
GLOBAL_INV in the waitcnt pass.
2025-12-17 17:53:00 +05:30
..
2025-12-05 12:39:50 +00:00
2025-12-05 12:39:50 +00:00
2025-12-05 12:39:50 +00:00

+==============================================================================+
| How to organize the lit tests                                                |
+==============================================================================+

- If you write a test for matching a single DAG opcode or intrinsic, it should
  go in a file called {opcode_name,intrinsic_name}.ll (e.g. fadd.ll)

- If you write a test that matches several DAG opcodes and checks for a single
  ISA instruction, then that test should go in a file called {ISA_name}.ll (e.g.
  bfi_int.ll

- For all other tests, use your best judgement for organizing tests and naming
  the files.

+==============================================================================+
| Naming conventions                                                           |
+==============================================================================+

- Use dash '-' and not underscore '_' to separate words in file names, unless
  the file is named after a DAG opcode or ISA instruction that has an
  underscore '_' in its name.