130 Commits

Author SHA1 Message Date
Jun Wang
86842e1f72
[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236)
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-04-10 10:47:04 -07:00
Jay Foad
9c58f3a234
[AMDGPU] Fix implicit $vcc operands after parsing MIR (#87781)
MIParser checks that implicit operands match the instruction definition,
so they have to be $vcc even in wave32 mode. Use the mirFileLoaded hook
to fix them after MIParser's checks, converting them to $vcc_lo which is
what that rest of CodeGen expects.

This is all just extending the fixImplicitOperands hack which was
introduced with GFX10, but at least it makes it possible to write a MIR
test which creates the same instructions that normal CodeGen would
generate.
2024-04-09 09:10:45 +01:00
Mariusz Sikora
94a550dab2
[AMDGPU][NFC] Rename Feature GFX11FullVGPRs to 1_5xVGPRs (#86468) 2024-03-25 11:00:59 +01:00
Changpeng Fang
addda68f08
AMDGPU: Rename HasVinterInsts to HasVINTERPEncoding, NFC (#84535) 2024-03-08 10:53:28 -08:00
Changpeng Fang
ee1bcf74ea
AMDGPI: Rename HasExpOrExportInsts to HasExportInsts. NFC (#84252) 2024-03-06 14:59:28 -08:00
Changpeng Fang
96813de52d
AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248)
FeatureDot11Insts (dot11-insts) for:
  v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8,
  v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8
2024-03-06 14:37:03 -08:00
Changpeng Fang
49ec8b747c
AMDGPU: Define and Use HasInterpInsts for interp inst definitions (#84102) 2024-03-05 18:26:37 -08:00
Changpeng Fang
d6c52c1e2d
AMDGPU: Define HasExpOrExportInsts for export instruction definitions. (#84083) 2024-03-05 15:32:32 -08:00
Jeffrey Byrnes
113052b2b0 [AMDGPU] Prefer lower total register usage in regions with spilling
Change-Id: Ia5c434b0945bdcbc357c5e06c3164118fc91df25
2024-02-26 12:19:52 -08:00
Krzysztof Drewniak
b497234146
[AMDGPU] Make maximum hard clause size a subtarget feature (#81287)
gfx11 chips may, in some conditions, behave incorrectly with S_CLAUSE
instructions (hard clauses) containing more than 32 operations (that is,
whose arguments exceed 0x1f). However, gfx10 targets will work
successfully with clauses of up to length 63.

Therefore, define the MaxHardClauseLength property on GCNSubtarget and
make it a subtarget feature via tablegen, thus allowing us to specify,
both now and in the future, the maximum viable size of clauses on
various hardware from the tablegen definition. If MaxHardClauseLength is
0, which is the default, the hardware does not support hard clauses.
2024-02-15 13:58:31 -06:00
Austin Kerbow
4bcbeaed63
[AMDGPU] Enable kernel arg preloading with gfx90a (#81180)
Add a trap instruction to the beginning of the kernel prologue to handle
cases where preloading is attempted on HW loaded with incompatible
firmware.
2024-02-12 22:33:29 -08:00
Pierre van Houtryve
f93aa5157a
[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.

Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.

This contains the documentation changes for both this change and #76954
as well.
2024-02-12 10:18:20 +01:00
Konstantin Zhuravlyov
a1df10da59
AMDGPU/NFC: Add predicate for supporting ds_add_f64 (#80379) 2024-02-02 08:29:25 -05:00
Konstantin Zhuravlyov
4eee04585f
AMDGPU/NFC: Add predicate for supporting buffer/flat/global f64 atomics (#80209) 2024-01-31 17:35:32 -05:00
Stanislav Mekhanoshin
1000cefc04
[AMDGPU] Remove s_set_inst_prefetch_distance support from GFX12 (#78786)
This instruction is not supported by GFX12.
2024-01-22 14:31:17 -08:00
Jay Foad
ba52f06f9d
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-18 10:47:45 +00:00
Jay Foad
9ca36932b5
[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186) 2024-01-18 10:23:27 +00:00
Jay Foad
c111dc72e9
[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193)
https://github.com/llvm/llvm-project/pull/70634 has disabled use
of potentially negative scratch offsets, but we can use it on GFX12.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2024-01-18 10:02:40 +00:00
Mariusz Sikora
264fd9e13e
[AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439) 2024-01-18 08:46:53 +01:00
Jay Foad
59cdf41f07
[AMDGPU] Do not run GCNNSAReassign pass for GFX12 (#78185)
GFX12 does not have separate NSA and non-NSA encodings.

---------

Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2024-01-17 15:22:48 +00:00
Jay Foad
4a77414660
[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633) 2024-01-17 10:28:03 +00:00
Jay Foad
42b9ea841e
[AMDGPU] Increase max scratch allocation for GFX12 (#77625) 2024-01-17 10:25:28 +00:00
Jay Foad
ba131b7017
[AMDGPU] Do not generate s_set_inst_prefetch_distance for GFX12 (#78190)
GFX12 can still encode the s_set_inst_prefetch_distance instruction but
it has no effect.
2024-01-15 18:20:45 +00:00
Jay Foad
ed60cb8fb9
[AMDGPU] Disable hasVALUPartialForwardingHazard for GFX12 (#78188) 2024-01-15 18:20:10 +00:00
Jay Foad
85705bbf1d
[AMDGPU] Disable hasVALUMaskWriteHazard for GFX12 (#78187) 2024-01-15 18:19:32 +00:00
Mariusz Sikora
2b83ceee3d
[AMDGPU][GFX12] Default component broadcast store (#76212)
For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pass only
X component, it will be the same as XXXX, or XY same as XYXX.

This patch simplifies the passed vector of components in InstCombine by
removing components from the end that are equal to the first component.

For image stores it also trims DMask if necessary.

---------

Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>
2024-01-12 08:26:08 +01:00
Jay Foad
b120dae9bb
[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628)
Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc
operand
in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set
appropriately
depending on whether a hazard is found or not, rather than inserting an
S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated.

Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>
2024-01-11 13:20:19 +00:00
Jay Foad
daa4728dee
[AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (#75825) 2024-01-08 19:13:38 +00:00
Jay Foad
e96e7a9a86
[AMDGPU] Implement readcyclecounter for GFX12 (#76965) 2024-01-05 08:20:52 +00:00
Nicolai Hähnle
49b492048a
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.

Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.

Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.

Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.

Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.

A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.

Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
2024-01-04 00:10:15 +01:00
Mariusz Sikora
414d27419f
[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-15 17:15:55 +01:00
Mirko Brkušanin
a278ac577e
[AMDGPU] CodeGen for SMEM instructions (#75579) 2023-12-15 12:10:33 +01:00
Mirko Brkušanin
569ef8ddd9
[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204) 2023-12-15 10:41:05 +01:00
Mirko Brkušanin
c1a6974d6b
[AMDGPU][MC] Add GFX12 SMEM encoding (#75215) 2023-12-15 09:00:54 +01:00
Jay Foad
c5a068a196
[AMDGPU] Remove s_cmpk_* for GFX12 (#75497)
No GFX12 encoding was added for these. This patch adds tests that they
are not recognized by the assembler and defends against generating them
in codegen.
2023-12-14 21:10:53 +00:00
Mirko Brkušanin
47615ddc84
[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193) 2023-12-14 14:22:04 +01:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Piotr Sobczak
6eec80133b
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 14:18:10 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Jay Foad
8005ee6da3
[AMDGPU] CodeGen for GFX12 64-bit scalar add/sub (#75070) 2023-12-12 17:41:40 +00:00
Mirko Brkušanin
f5868cb6a6
[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062) 2023-12-04 13:04:42 +01:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Stephen Thomas
720be6c535
[AMDGPU] Add encoding/decoding support for non-result-returning ATOMIC_CSUB instructions (#68684)
The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have
encodings for
non-value-returning forms, although actually using them isn't supported
by
hardware. However, these encodings aren't supported by the backend,
meaning
that they can't even be assembled or disassembled.

Add support for the non-returning encodings, but gate actually using
them
in instruction selection behind a new feature
FeatureAtomicCSubNoRtnInsts,
which no target uses. This does allow the non-returning instructions to
be
tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover
them.
The feature does not gate assembling or disassembling them, this is now
not an error, and encoding and decoding tests have been adapted
accordingly.
2023-10-11 11:37:27 +01:00
Jay Foad
2a0ec5f1ac
[AMDGPU] Add GFX11.5 s_singleuse_vdst instruction (#67536) 2023-09-29 15:05:13 +01:00
Mirko Brkušanin
2cd2445c21
[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets (#67461)
In order to avoid duplicating every dpp pseudo opcode that has src1, we
allow it for all opcodes and add manual checks on subtargets that do not
support it.
2023-09-29 11:54:49 +02:00
Jay Foad
d85d143ad9
[AMDGPU] New image intrinsic optimizer pass (#67151)
Implement a new pass to combine multiple image_load_2dmsaa and
2darraymsaa intrinsic calls into a single image_msaa_load if:

- they refer to the same vaddr except for sample_id,
- they use a constant sample_id and they fall into the same group,
- they have the same dmask and the number of instructions and the
  number of vaddr/vdata dword transfers is reduced by the combine

This should be valid on all GFX11 but a hardware bug renders it
unworkable on GFX11.0.* so it is only enabled for GFX11.5.

Based on a patch by Rodrigo Dominguez!
2023-09-26 09:33:49 +01:00
Austin Kerbow
0455596e1e [AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading.
These changes are not usable with older firmware but subsequent patches
in the series will make the codegen backwards compatible. This patch
should only be submitted alongside that subsequent patch.

Preloading here begins from the start of the kernel arguments until the
amount of arguments indicated by the CL flag
amdgpu-kernarg-preload-count.

Aggregates and arguments passed by-ref are not supported.

Special care for the alignment of the kernarg segment is needed as well
as consideration of the alignment of addressable SGPR tuples when we
cannot directly use misaligned large tuples that the arguments are
loaded to.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D158579
2023-09-25 09:32:59 -07:00
Mirko Brkušanin
923285b139
[AMDGPU] Add gfx1150 SALU Float instructions (#66884) 2023-09-21 11:22:15 +02:00
Austin Kerbow
69447d6afe [AMDGPU] Add ASM and MC updates for preloading kernargs
Add assembler directives for preloading kernel arguments that correspond
to new fields in the kernel descriptor for the length and offset of
arguments that will be placed in SGPRs prior to kernel launch. Alignment
of the arguments in SGPRs is equivalent to the kernarg segment when
accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated
directly after other user SGPRs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159459
2023-09-19 15:45:16 -07:00
Simon Pilgrim
47a9cd0343 [AMDGPU] Remove constexpr from getNumUserSGPRForField/getMaxNumPreloadedSGPRs to appease older gcc builds
Older versions of gcc wouldn't accept the constexpr getNumUserSGPRForField (introduced in D159439 / 343be5132e2831d85) as it couldn't treat the llvm_unreachable call as constexpr
2023-09-13 12:19:28 +01:00