WMMA XDL instructions are tracked as TRANs ops and the compiler should
consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable
table for the InsertDelayAlu pass to recognize these WMMA XDL instructions.
Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>
instructionWaitsForSGPRWrites function covers ALL SALU instructions,
including those like s_waitcnt that don't read from sgpr. This results
in removing delay_alu instructions in cases like VALU->SGPR->VALU, which
results in performance regression. Change modifies the function so that
it checks if instruction also reads a sgpr.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch uses insert_range in
conjunction with llvm::{predecessors,successors} and
MachineBasicBlock::{predecessors,successors}.
We have a VALU->SGPR->SALU (VALU writing to SGPR and SALU reading from
it). When VALU is issued, it increments internal counter VA_SDST used to
track use of this SGPR. SALU will not issue until VA_SDST is zero, that
is when VALU is finished writing. Therefore, delays added by s_delay_alu
are not needed in this situation.
We have a VALU->SGPR->SALU (VALU writing to SGPR and SALU reading from
it). When VALU is issued, it increments internal counter VA_SDST used to
track use of this SGPR. SALU will not issue until VA_SDST is zero, that
is when VALU is finished writing. Therefore, delays added by s_delay_alu
are not needed in this situation.
Add functions AMDGPU::DepCtr::encodeField*() and AMDGPU::DepCtr::decodeField*()
for each of vm_vsrc, va_vdst and sa_sdst. These are now used in
AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with
S_WAITCNT_DEPCTR operands easier and more readable.
Differential Revision: https://reviews.llvm.org/D154424
This just tweaks the fix for D145232 to make the limit more precise, so
that we could actually emit a delay of 3 SALU cycles (the maximum) if we
had any SALU instructions that required it.
Not sure why this was broken, but I was seeing this linker error:
ld64.lld: error: undefined symbol: (anonymous namespace)::AMDGPUInsertDelayAlu::DelayInfo::SALU_CYCLES_MAX
>>> referenced by AMDGPUInsertDelayAlu.cpp:129 (/Users/matt/src/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:129)