236 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
4f34c740ab
[AMDGPU] w/a for s_setreg_b32 gfx1250 hazard with MODE register (#153879) 2025-08-15 16:08:13 -07:00
Stanislav Mekhanoshin
f1fc50748a
[AMDGPU] w/a hazard with writing s102/103 and reading FLAT_SCRATCH_BASE (#153878) 2025-08-15 15:23:06 -07:00
Stanislav Mekhanoshin
1f25c4883e
[AMDGPU] Mitigate DS_ATOMIC_ASYNC_BARRIER_ARRIVE_B64 bug (#153872)
DS_ATOMIC_ASYNC_BARRIER_ARRIVE_B64 shall not be claused (we already do
not clause DS instructions) and needs waits before and after.
2025-08-15 14:17:54 -07:00
Stanislav Mekhanoshin
29976f2e58
[AMDGPU] Handle S_GETREG_B32 hazard on gfx1250 (#153848)
GFX1250 SPG says: S_GETREG_B32 does not wait for idle before executing.
The user must S_WAIT_ALU 0 before S_GETREG_B32 on:
STATUS, STATE_PRIV, EXCP_FLAG_PRIV, or EXCP_FLAG_USER.
2025-08-15 11:38:22 -07:00
Stanislav Mekhanoshin
5d28284dbb
[AMDGPU] gfx1250 does not need nop before VGPR dealloc (#153844)
This has no impact as the dealloc is now practically disabled.
2025-08-15 11:29:02 -07:00
Stanislav Mekhanoshin
8bce10ac6d
[AMDGPU] Enable kernarg preload on gfx1250 (#153686) 2025-08-14 16:29:53 -07:00
Stanislav Mekhanoshin
a629119c75
[AMDGPU] Remove wave64 functions (#153690)
gfx1250 only supports wave32.
2025-08-14 15:54:33 -07:00
Stanislav Mekhanoshin
57c1e01e48
[AMDGPU] Don't allow wgp mode on gfx1250 (#153680)
- gfx1250 only supports cu mode
2025-08-14 15:16:56 -07:00
Stanislav Mekhanoshin
b296ea9c14
[AMDGPU] s_get_shader_cycles_u64 gfx1250 instruction (#152390)
It is the same as reading SHADER_CYCLES_LO and SHADER_CYCLES_HI
but with a single instruction.
2025-08-06 15:32:28 -07:00
Stanislav Mekhanoshin
d1b6ce50df
[AMDGPU] gfx1250 has fixed GETPC bug and also extended VA to 57 bits (#152373) 2025-08-06 13:32:26 -07:00
Stanislav Mekhanoshin
c2eddec4ff
[AMDGPU] System scope atomics are emulated over PCIe in gfx1250 (#152369)
HW will emulate unsupported PCIe atomics via CAS loop, we do not need to
expand these anymore.
2025-08-06 13:08:12 -07:00
Stanislav Mekhanoshin
334d0be2d4
[AMDGPU] Support 64-bit LDS atomic fadd on gfx1250 (#152368) 2025-08-06 13:07:56 -07:00
Stanislav Mekhanoshin
d08c2977e8
[AMDGPU] Add MC support for new gfx1250 src_flat_scratch_base_lo/hi (#152203) 2025-08-05 14:35:48 -07:00
Stanislav Mekhanoshin
0988510ad4
[AMDGPU] gfx1250 v_perm_pk16_* instructions (#151773) 2025-08-01 20:12:35 -07:00
Harrison Hao
f9b258c73a
[AMDGPU] Support function attribute to override postRA scheduling direction (#147708)
This patch adds support for controlling the post-RA machine scheduler
direction
(topdown, bottomup, bidirectional) on a per-function basis using the 
"amdgpu-post-ra-direction" function attribute.
2025-08-01 16:07:09 +08:00
Stanislav Mekhanoshin
d70f228e83
[AMDGPU] Add gfx1250 V_ADD_{MIN|MAX}_{U|I}32 instructions (#151379) 2025-07-30 13:12:14 -07:00
Stanislav Mekhanoshin
3dfd939a16
[AMDGPU] gfx1250 V_{MIN|MAX}_{I|U}64 opcodes (#151256) 2025-07-29 19:13:51 -07:00
Stanislav Mekhanoshin
d99238263c
[AMDGPU] Implement v_mad_u32/v_mad_nc_u|i64_u32 on gfx1250 (#151226) 2025-07-29 15:06:35 -07:00
Changpeng Fang
6184ef1c2f
[AMDGPU] Support f64 atomics on gfx1250 (#151172)
- BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64
   - DS_ADD_F64

Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>
2025-07-29 09:41:00 -07:00
Pierre van Houtryve
be17791f26
[AMDGPU][gfx1250] Add cu-store subtarget feature (#150588)
Determines whether we can use `SCOPE_CU` stores (on by default), or
whether all stores must be done at `SCOPE_SE` minimum.
2025-07-29 11:38:43 +02:00
Matt Arsenault
44ff1ed16e
AMDGPU: Move getMaxNumVectorRegs into GCNSubtarget (NFC) (#150889)
Addresses a TODO
2025-07-28 17:25:20 +09:00
Stanislav Mekhanoshin
96e5eed92a
[AMDGPU] Select VMEM prefetch for llvm.prefetch on gfx1250 (#150493)
We have a choice to use a scalar or vector prefetch for an uniform
pointer. Since we do not have scalar stores our scalar cache is
practically readonly. The rw argument of the prefetch intrinsic is
used to force vector operation even for an uniform case. On GFX12
scalar prefetch will be used anyway, it is still useful but it will
only bring data to L2.
2025-07-24 13:22:50 -07:00
Stanislav Mekhanoshin
a70f7dafc1
[AMDGPU] gfx1250 flat and global prefetch MC support (#150455) 2025-07-24 11:00:56 -07:00
Changpeng Fang
473bc0d188
[AMDGPU] Support V_FMA_MIX*_BF16 instructions on gfx1250 (#150381)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-24 09:43:49 -07:00
Changpeng Fang
eb43b79765
[AMDGPU] Disable SGPR read hazard mitigation for gfx1250 (#150344)
Co-authored-by: Jay Foad <Jay.Foad@amd.com>
2025-07-24 00:05:58 -07:00
Changpeng Fang
9a563b08e2
[AMDGPU] Support V_PK_MIN3/MAX3_NUM_F16 on gfx1250 (#150326)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-23 15:15:19 -07:00
Stanislav Mekhanoshin
2346968807
[AMDGPU] Add V_ADD|SUB|MUL_U64 gfx1250 opcodes (#150291) 2025-07-23 13:17:56 -07:00
Changpeng Fang
d385e9d86b
AMDGPU: Support V_PK_ADD_{MIN|MAX}_{I|U}16 and V_{MIN|MAX}3_{I|U}16 on gfx1250 (#150155)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-23 00:17:22 -07:00
Stanislav Mekhanoshin
a0973de745
[AMDGPU] Select scale_offset for global instructions on gfx1250 (#150107)
Also switches immediate offset to signed for the subtarget.
2025-07-22 15:04:52 -07:00
Harrison Hao
8c14d3f44f
[MISched] Use SchedRegion in overrideSchedPolicy and overridePostRASchedPolicy (#149297)
This patch updates `overrideSchedPolicy` and `overridePostRASchedPolicy`
to take a
`SchedRegion` parameter instead of just `NumRegionInstrs`. This provides
access to both the
instruction range and the parent `MachineBasicBlock`, which enables
looking up function-level
attributes.

With this change, targets can select post-RA scheduling direction per
function using a function
attribute. For example:

```cpp
void overridePostRASchedPolicy(MachineSchedPolicy &Policy,
                               const SchedRegion &Region) const {
  const Function &F = Region.RegionBegin->getMF()->getFunction();
  Attribute Attr = F.getFnAttribute("amdgpu-post-ra-direction");
  ...
}
2025-07-22 15:55:12 +08:00
Stanislav Mekhanoshin
a0b854d576
[AMDGPU] MC support for gfx1250 scale_offset modifier (#149881) 2025-07-21 15:04:59 -07:00
Shilei Tian
7e105fbdbe
[AMDGPU] Add support for v_tanh_f32 on gfx1250 (#149360)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-17 15:42:35 -04:00
Stanislav Mekhanoshin
9912ccb0b4
[AMDGPU] gfx1250 MC support for FLAT GVS addressing (#149173) 2025-07-16 14:35:07 -07:00
Stanislav Mekhanoshin
82d7405b3b
[AMDGPU] Use S_ADD_PC_I64 for long branches in gfx1250 (#148961) 2025-07-15 17:14:56 -07:00
Stanislav Mekhanoshin
d1e3ab9c4b
[AMDGPU] Use v_mov_b64 in codegen on gfx1250 (#148272) 2025-07-11 22:16:50 -07:00
Stanislav Mekhanoshin
f090554359
[AMDGPU] MC support for v_fmaak_f64/v_fmamk_f64 gfx1250 intructions (#148282) 2025-07-11 14:17:03 -07:00
Stanislav Mekhanoshin
7920dff394
[AMDGPU] VOPD/VOPD3 changes for gfx1250 (#147602) 2025-07-10 14:15:01 -07:00
Stanislav Mekhanoshin
00a85e5704
[AMDGPU] gfx1250: MC support for 64-bit literals (#147861) 2025-07-09 22:25:47 -07:00
Stanislav Mekhanoshin
d0a4af725e
[AMDGPU] Add FeatureIEEEMinimumMaximumInsts. NFCI. (#147594)
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2025-07-08 14:32:44 -07:00
Shilei Tian
d258457d42
[AMDGPU] Add support for v_cvt_f32_fp8 on gfx1250 (#147579)
Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>
2025-07-08 16:21:24 -04:00
Changpeng Fang
5035d20dcb
AMDGPU: Implement ds_atomic_async_barrier_arrive_b64/ds_atomic_barrier_arrive_rtn_b64 (#146409)
These two instructions are supported by gfx1250. We define the
instructions and implement the corresponding intrinsic and builtin.

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-01 11:08:49 -07:00
Changpeng Fang
4729242878
AMDGPU: Add MC layer support for load transpose instructions for gfx1250 (#146024)
Co-authored with @jayfoad
2025-06-26 22:30:31 -07:00
Changpeng Fang
2550a637a1
AMDGPU: Remove LDS-direct(param)-loads and VINTERP ops from gfx1250 support (#145631) 2025-06-24 22:44:35 -07:00
Changpeng Fang
3de2af3ef5
AMDGPU: Remove export and related instructions from gfx1250 support (#145624) 2025-06-24 19:59:26 -07:00
Changpeng Fang
fe8a26263a
AMDGPU: Remove Formatted MUBUF instructions from gfx1250 support (#145590) 2025-06-24 14:17:13 -07:00
Stanislav Mekhanoshin
fe0568389d
[AMDGPU] Require aligned VGPRs for gfx1250 (#145561) 2025-06-24 12:16:01 -07:00
Changpeng Fang
ce4d214947
AMDGPU: Remove MTBUF instructions from gfx1250 support (#145563) 2025-06-24 11:59:13 -07:00
Diana Picus
a201f8872a
[AMDGPU] Replace dynamic VGPR feature with attribute (#133444)
Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget
feature, as requested in #130030.
2025-06-24 11:09:36 +02:00
Stanislav Mekhanoshin
40eee8ec7f
[AMDGPU] Add s_setprio_inc_wg gfx1250 instruction (#145152) 2025-06-22 12:52:05 -07:00
Stanislav Mekhanoshin
affcc5e728
[AMDGPU] Add s_wait_xcnt gfx1250 instruction (#145086) 2025-06-20 12:28:18 -07:00