llvm-project

Author	SHA1	Message	Date
Jun Wang	86842e1f72	[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236 ) This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counter values are always 0, but which counters are involved depends on the memory instruction. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-04-10 10:47:04 -07:00
Jay Foad	9c58f3a234	[AMDGPU] Fix implicit $vcc operands after parsing MIR (#87781 ) MIParser checks that implicit operands match the instruction definition, so they have to be $vcc even in wave32 mode. Use the mirFileLoaded hook to fix them after MIParser's checks, converting them to $vcc_lo which is what that rest of CodeGen expects. This is all just extending the fixImplicitOperands hack which was introduced with GFX10, but at least it makes it possible to write a MIR test which creates the same instructions that normal CodeGen would generate.	2024-04-09 09:10:45 +01:00
Mariusz Sikora	94a550dab2	[AMDGPU][NFC] Rename Feature GFX11FullVGPRs to 1_5xVGPRs (#86468 )	2024-03-25 11:00:59 +01:00
Changpeng Fang	addda68f08	AMDGPU: Rename HasVinterInsts to HasVINTERPEncoding, NFC (#84535 )	2024-03-08 10:53:28 -08:00
Changpeng Fang	ee1bcf74ea	AMDGPI: Rename HasExpOrExportInsts to HasExportInsts. NFC (#84252 )	2024-03-06 14:59:28 -08:00
Changpeng Fang	96813de52d	AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248 ) FeatureDot11Insts (dot11-insts) for: v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8, v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8	2024-03-06 14:37:03 -08:00
Changpeng Fang	49ec8b747c	AMDGPU: Define and Use HasInterpInsts for interp inst definitions (#84102 )	2024-03-05 18:26:37 -08:00
Changpeng Fang	d6c52c1e2d	AMDGPU: Define HasExpOrExportInsts for export instruction definitions. (#84083 )	2024-03-05 15:32:32 -08:00
Jeffrey Byrnes	113052b2b0	[AMDGPU] Prefer lower total register usage in regions with spilling Change-Id: Ia5c434b0945bdcbc357c5e06c3164118fc91df25	2024-02-26 12:19:52 -08:00
Krzysztof Drewniak	b497234146	[AMDGPU] Make maximum hard clause size a subtarget feature (#81287 ) gfx11 chips may, in some conditions, behave incorrectly with S_CLAUSE instructions (hard clauses) containing more than 32 operations (that is, whose arguments exceed 0x1f). However, gfx10 targets will work successfully with clauses of up to length 63. Therefore, define the MaxHardClauseLength property on GCNSubtarget and make it a subtarget feature via tablegen, thus allowing us to specify, both now and in the future, the maximum viable size of clauses on various hardware from the tablegen definition. If MaxHardClauseLength is 0, which is the default, the hardware does not support hard clauses.	2024-02-15 13:58:31 -06:00
Austin Kerbow	4bcbeaed63	[AMDGPU] Enable kernel arg preloading with gfx90a (#81180 ) Add a trap instruction to the beginning of the kernel prologue to handle cases where preloading is attempted on HW loaded with incompatible firmware.	2024-02-12 22:33:29 -08:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00
Konstantin Zhuravlyov	a1df10da59	AMDGPU/NFC: Add predicate for supporting ds_add_f64 (#80379 )	2024-02-02 08:29:25 -05:00
Konstantin Zhuravlyov	4eee04585f	AMDGPU/NFC: Add predicate for supporting buffer/flat/global f64 atomics (#80209 )	2024-01-31 17:35:32 -05:00
Stanislav Mekhanoshin	1000cefc04	[AMDGPU] Remove s_set_inst_prefetch_distance support from GFX12 (#78786 ) This instruction is not supported by GFX12.	2024-01-22 14:31:17 -08:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00
Jay Foad	9ca36932b5	[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186 )	2024-01-18 10:23:27 +00:00
Jay Foad	c111dc72e9	[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193 ) https://github.com/llvm/llvm-project/pull/70634 has disabled use of potentially negative scratch offsets, but we can use it on GFX12. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2024-01-18 10:02:40 +00:00
Mariusz Sikora	264fd9e13e	[AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439 )	2024-01-18 08:46:53 +01:00
Jay Foad	59cdf41f07	[AMDGPU] Do not run GCNNSAReassign pass for GFX12 (#78185 ) GFX12 does not have separate NSA and non-NSA encodings. --------- Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2024-01-17 15:22:48 +00:00
Jay Foad	4a77414660	[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633 )	2024-01-17 10:28:03 +00:00
Jay Foad	42b9ea841e	[AMDGPU] Increase max scratch allocation for GFX12 (#77625 )	2024-01-17 10:25:28 +00:00
Jay Foad	ba131b7017	[AMDGPU] Do not generate s_set_inst_prefetch_distance for GFX12 (#78190 ) GFX12 can still encode the s_set_inst_prefetch_distance instruction but it has no effect.	2024-01-15 18:20:45 +00:00
Jay Foad	ed60cb8fb9	[AMDGPU] Disable hasVALUPartialForwardingHazard for GFX12 (#78188 )	2024-01-15 18:20:10 +00:00
Jay Foad	85705bbf1d	[AMDGPU] Disable hasVALUMaskWriteHazard for GFX12 (#78187 )	2024-01-15 18:19:32 +00:00
Mariusz Sikora	2b83ceee3d	[AMDGPU][GFX12] Default component broadcast store (#76212 ) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pass only X component, it will be the same as XXXX, or XY same as XYXX. This patch simplifies the passed vector of components in InstCombine by removing components from the end that are equal to the first component. For image stores it also trims DMask if necessary. --------- Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>	2024-01-12 08:26:08 +01:00
Jay Foad	b120dae9bb	[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628 ) Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc operand in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set appropriately depending on whether a hazard is found or not, rather than inserting an S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated. Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2024-01-11 13:20:19 +00:00
Jay Foad	daa4728dee	[AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (#75825 )	2024-01-08 19:13:38 +00:00
Jay Foad	e96e7a9a86	[AMDGPU] Implement readcyclecounter for GFX12 (#76965 )	2024-01-05 08:20:52 +00:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Mariusz Sikora	414d27419f	[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-15 17:15:55 +01:00
Mirko Brkušanin	a278ac577e	[AMDGPU] CodeGen for SMEM instructions (#75579 )	2023-12-15 12:10:33 +01:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Mirko Brkušanin	c1a6974d6b	[AMDGPU][MC] Add GFX12 SMEM encoding (#75215 )	2023-12-15 09:00:54 +01:00
Jay Foad	c5a068a196	[AMDGPU] Remove s_cmpk_* for GFX12 (#75497 ) No GFX12 encoding was added for these. This patch adds tests that they are not recognized by the assembler and defends against generating them in codegen.	2023-12-14 21:10:53 +00:00
Mirko Brkušanin	47615ddc84	[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193 )	2023-12-14 14:22:04 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Jay Foad	8005ee6da3	[AMDGPU] CodeGen for GFX12 64-bit scalar add/sub (#75070 )	2023-12-12 17:41:40 +00:00
Mirko Brkušanin	f5868cb6a6	[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062 )	2023-12-04 13:04:42 +01:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Stephen Thomas	720be6c535	[AMDGPU] Add encoding/decoding support for non-result-returning ATOMIC_CSUB instructions (#68684 ) The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have encodings for non-value-returning forms, although actually using them isn't supported by hardware. However, these encodings aren't supported by the backend, meaning that they can't even be assembled or disassembled. Add support for the non-returning encodings, but gate actually using them in instruction selection behind a new feature FeatureAtomicCSubNoRtnInsts, which no target uses. This does allow the non-returning instructions to be tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover them. The feature does not gate assembling or disassembling them, this is now not an error, and encoding and decoding tests have been adapted accordingly.	2023-10-11 11:37:27 +01:00
Jay Foad	2a0ec5f1ac	[AMDGPU] Add GFX11.5 s_singleuse_vdst instruction (#67536 )	2023-09-29 15:05:13 +01:00
Mirko Brkušanin	2cd2445c21	[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets (#67461 ) In order to avoid duplicating every dpp pseudo opcode that has src1, we allow it for all opcodes and add manual checks on subtargets that do not support it.	2023-09-29 11:54:49 +02:00
Jay Foad	d85d143ad9	[AMDGPU] New image intrinsic optimizer pass (#67151 ) Implement a new pass to combine multiple image_load_2dmsaa and 2darraymsaa intrinsic calls into a single image_msaa_load if: - they refer to the same vaddr except for sample_id, - they use a constant sample_id and they fall into the same group, - they have the same dmask and the number of instructions and the number of vaddr/vdata dword transfers is reduced by the combine This should be valid on all GFX11 but a hardware bug renders it unworkable on GFX11.0.* so it is only enabled for GFX11.5. Based on a patch by Rodrigo Dominguez!	2023-09-26 09:33:49 +01:00
Austin Kerbow	0455596e1e	[AMDGPU] Add DAG ISel support for preloaded kernel arguments This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch. Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count. Aggregates and arguments passed by-ref are not supported. Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D158579	2023-09-25 09:32:59 -07:00
Mirko Brkušanin	923285b139	[AMDGPU] Add gfx1150 SALU Float instructions (#66884 )	2023-09-21 11:22:15 +02:00
Austin Kerbow	69447d6afe	[AMDGPU] Add ASM and MC updates for preloading kernargs Add assembler directives for preloading kernel arguments that correspond to new fields in the kernel descriptor for the length and offset of arguments that will be placed in SGPRs prior to kernel launch. Alignment of the arguments in SGPRs is equivalent to the kernarg segment when accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated directly after other user SGPRs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159459	2023-09-19 15:45:16 -07:00
Simon Pilgrim	47a9cd0343	[AMDGPU] Remove constexpr from getNumUserSGPRForField/getMaxNumPreloadedSGPRs to appease older gcc builds Older versions of gcc wouldn't accept the constexpr getNumUserSGPRForField (introduced in D159439 / 343be5132e2831d85) as it couldn't treat the llvm_unreachable call as constexpr	2023-09-13 12:19:28 +01:00

1 2 3

130 Commits