20 Commits

Author SHA1 Message Date
Mariusz Sikora
3c0f5045e1
[AMDGPU] Add FeatureGFX13 and SMEM encoding for gfx13 (#177567)
For now list of features is based on gfx12 and gfx1250

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2026-01-26 14:16:36 +01:00
Jay Foad
deacb03b28
[AMDGPU] Do not use s_delay_alu instskip to skip s_set_vgpr_msb (#175925)
Fixes: SWDEV-576227
2026-01-20 10:59:55 +00:00
Sergei Barannikov
86d712cda4
[AMDGPU] Use MCRegUnit, insert explicit casts to/from unsigned (NFC) (#167889)
The casts are currently no-op because `MCRegUnit` is a typedef'ed to
`unsigned`, but this will change soon enough and explicit cast will be
required.
2025-11-13 21:39:02 +03:00
Changpeng Fang
b52cf756ce
AMDGPU: Treat WMMA XDL ops as TRANS in S_DELAY_ALU insertion for gfx1250 (#149208)
WMMA XDL instructions are tracked as TRANs ops and the compiler should
consider them the same as TRANS in S_DELAY_ALU insertion. We use a searchable
table for the InsertDelayAlu pass to recognize these WMMA XDL instructions.

Co-authored-by: Stefan Stipanovic <Stefan.Stipanovic@amd.com>
2025-07-16 17:07:48 -07:00
Jay Foad
e717e503ca
[AMDGPU] Fix comment on DelayInfo::advance (#146718) 2025-07-02 17:18:57 +01:00
Ana Mihajlovic
08d747c1ef
[AMDGPU] Fix bad removal of s_delay_alu (#145728)
instructionWaitsForSGPRWrites function covers ALL SALU instructions,
including those like s_waitcnt that don't read from sgpr. This results
in removing delay_alu instructions in cases like VALU->SGPR->VALU, which
results in performance regression. Change modifies the function so that
it checks if instruction also reads a sgpr.
2025-06-27 16:15:10 +02:00
Kazu Hirata
1b189cab5e
[llvm] Use *Set::insert_range (NFC) (#132509)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range in
conjunction with llvm::{predecessors,successors} and
MachineBasicBlock::{predecessors,successors}.
2025-03-22 08:07:33 -07:00
Ana Mihajlovic
459b4e3fe1
Reland "[AMDGPU] Remove s_delay_alu for VALU->SGPR->SALU (#127212)" (#131111)
We have a VALU->SGPR->SALU (VALU writing to SGPR and SALU reading from
it). When VALU is issued, it increments internal counter VA_SDST used to
track use of this SGPR. SALU will not issue until VA_SDST is zero, that
is when VALU is finished writing. Therefore, delays added by s_delay_alu
are not needed in this situation.
2025-03-13 10:26:20 +01:00
Kazu Hirata
aa008e0008 Revert "[AMDGPU] Remove s_delay_alu for VALU->SGPR->SALU (#127212)"
This reverts commit 71582c6667a6334c688734cae628e906b3c1ac1d.

Multiple buildbot failures have been reported:
https://github.com/llvm/llvm-project/pull/127212
2025-03-12 12:09:09 -07:00
Ana Mihajlovic
71582c6667
[AMDGPU] Remove s_delay_alu for VALU->SGPR->SALU (#127212)
We have a VALU->SGPR->SALU (VALU writing to SGPR and SALU reading from
it). When VALU is issued, it increments internal counter VA_SDST used to
track use of this SGPR. SALU will not issue until VA_SDST is zero, that
is when VALU is finished writing. Therefore, delays added by s_delay_alu
are not needed in this situation.
2025-03-12 09:33:07 -07:00
Kazu Hirata
6cb2f6de9b
[AMDGPU] Avoid repeated hash lookups (NFC) (#130235) 2025-03-06 23:58:16 -08:00
Akshat Oke
852923822f
[AMDGPU][NewPM] Port AMDGPUInsertDelayAlu to NPM (#128003) 2025-02-26 09:50:09 +05:30
Juan Manuel Martinez Caamaño
d617371375
[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859)
Instead of allocating an initializing a new instance in
`GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.
2024-10-02 18:17:27 +02:00
Stephen Thomas
8aedad0fa0 [AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands
Add functions AMDGPU::DepCtr::encodeField*() and AMDGPU::DepCtr::decodeField*()
for each of vm_vsrc, va_vdst and sa_sdst. These are now used in
AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with
S_WAITCNT_DEPCTR operands easier and more readable.

Differential Revision: https://reviews.llvm.org/D154424
2023-07-04 11:02:12 +01:00
Sergei Barannikov
aa2d0fbc30 [MC] Add MCRegisterInfo::regunits for iteration over register units
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152098
2023-06-16 05:39:50 +03:00
Jay Foad
7ba61eaf34 [AMDGPU] More precise limit on SALU cycles in s_delay_alu instructions
This just tweaks the fix for D145232 to make the limit more precise, so
that we could actually emit a delay of 3 SALU cycles (the maximum) if we
had any SALU instructions that required it.
2023-03-05 08:14:15 +00:00
Matt Arsenault
ce3d93e4be AMDGPU: Use static constexpr instead of static const
Not sure why this was broken, but I was seeing this linker error:

ld64.lld: error: undefined symbol: (anonymous namespace)::AMDGPUInsertDelayAlu::DelayInfo::SALU_CYCLES_MAX
>>> referenced by AMDGPUInsertDelayAlu.cpp:129 (/Users/matt/src/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:129)
2023-03-03 18:50:34 -04:00
Jay Foad
7442f8635b [AMDGPU] Fix invalid instid value in s_delay_alu instruction
Differential Revision: https://reviews.llvm.org/D145232
2023-03-03 21:08:26 +00:00
Jay Foad
a07584d57d [CodeGen] Make more use of MachineOperand::getOperandNo. NFC.
Differential Revision: https://reviews.llvm.org/D143252
2023-02-07 11:50:57 +00:00
Jay Foad
cfb7ffdec0 [AMDGPU] New AMDGPUInsertDelayAlu pass
Differential Revision: https://reviews.llvm.org/D128270
2022-06-29 21:30:20 +01:00