Examine instructions in the pending queue when scheduling. This makes instructions visible to scheduling heuristics even when they aren't immediately issuable due to hardware resource constraints. The scheduler has two hardware resource modeling modes: an in-order mode where instructions must be ready to issue before scheduling, and out-of-order models where instructions are always visible to heuristics. Special handling exists for unbuffered processor resources in out-of-order models. These resources can cause pipeline stalls when used back-to-back, so they're typically avoided. However, for AMDGPU targets, managing register pressure and reducing spilling is critical enough to justify exceptions to this approach. This change enables examination of instructions that can't be immediately issued because they use an already occupied unbuffered resource. By making these instructions visible to scheduling heuristics anyway, we gain more flexibility in scheduling decisions, potentially allowing better register pressure and hardware resource management.
33 lines
1.2 KiB
YAML
33 lines
1.2 KiB
YAML
# RUN: llc -march=amdgcn -mcpu=gfx908 -run-pass machine-scheduler --misched-prera-direction=topdown -verify-machineinstrs %s -o - -debug-only=machine-scheduler 2>&1 | FileCheck %s
|
|
# REQUIRES: asserts
|
|
|
|
# Check that cycle counts are consistent with hazards.
|
|
|
|
# CHECK: Cycle: 3 TopQ.A
|
|
# CHECK: hazard: SU(6) HWXDL[0]=9c, is later than CurrCycle = 3c
|
|
# CHECK-NOT: Cycle: 9 TopQ.A
|
|
# CHECK: Cycle: 83 TopQ.A
|
|
# CHECK: Checking pending node SU(6)
|
|
# CHECK: Move SU(6) into Available Q
|
|
|
|
---
|
|
name: pending_queue_ready_cycle
|
|
tracksRegLiveness: true
|
|
body: |
|
|
bb.0:
|
|
liveins: $sgpr4_sgpr5
|
|
|
|
%2:sgpr_128 = IMPLICIT_DEF
|
|
%14:vgpr_32 = IMPLICIT_DEF
|
|
%15:vgpr_32 = IMPLICIT_DEF
|
|
%18:areg_512 = IMPLICIT_DEF
|
|
%18:areg_512 = V_MFMA_F32_16X16X1F32_mac_e64 %15, %14, %18, 0, 0, 0, implicit $mode, implicit $exec
|
|
%5:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET %2, 0, 0, 0, 0, implicit $exec
|
|
%18:areg_512 = V_MFMA_F32_16X16X1F32_mac_e64 %15, %14, %18, 0, 0, 0, implicit $mode, implicit $exec
|
|
undef %84.sub0:vreg_128_align2 = V_ADD_U32_e32 %5.sub0, %14, implicit $exec
|
|
%7:vreg_512 = COPY %18
|
|
SCHED_BARRIER 0
|
|
S_NOP 0, implicit %18, implicit %7, implicit %84
|
|
S_ENDPGM 0
|
|
...
|