llvm-project

Author	SHA1	Message	Date
Jun Wang	86842e1f72	[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236 ) This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counter values are always 0, but which counters are involved depends on the memory instruction. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-04-10 10:47:04 -07:00
Mariusz Sikora	94a550dab2	[AMDGPU][NFC] Rename Feature GFX11FullVGPRs to 1_5xVGPRs (#86468 )	2024-03-25 11:00:59 +01:00
Joe Nash	2a3f27cce8	[AMDGPU][True16] Make NotHasTrue16BitInsts a True16Predicate (#84771 ) NFC. Test coverage on VOPC shows NotHasTrue16BitInsts on the pre-gfx11 instructions is necessary (we cannot use the default NoTrue16Predicate). Update the VOP2 instructions in the same manner.	2024-03-11 13:58:45 -04:00
Changpeng Fang	addda68f08	AMDGPU: Rename HasVinterInsts to HasVINTERPEncoding, NFC (#84535 )	2024-03-08 10:53:28 -08:00
Changpeng Fang	ee1bcf74ea	AMDGPI: Rename HasExpOrExportInsts to HasExportInsts. NFC (#84252 )	2024-03-06 14:59:28 -08:00
Changpeng Fang	96813de52d	AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248 ) FeatureDot11Insts (dot11-insts) for: v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8, v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8	2024-03-06 14:37:03 -08:00
Changpeng Fang	49ec8b747c	AMDGPU: Define and Use HasInterpInsts for interp inst definitions (#84102 )	2024-03-05 18:26:37 -08:00
Changpeng Fang	d6c52c1e2d	AMDGPU: Define HasExpOrExportInsts for export instruction definitions. (#84083 )	2024-03-05 15:32:32 -08:00
Ivan Kosarev	f122268c04	[AMDGPU][NFC] Extend PredicateControl to support True16 predicates. (#82245 ) Using OtherPredicates for True16 predicates is often problematic due to interference with other kinds of predicates, particularly when this overrides predicates inherited from pseudo instructions.	2024-02-20 11:37:44 +00:00
Krzysztof Drewniak	b497234146	[AMDGPU] Make maximum hard clause size a subtarget feature (#81287 ) gfx11 chips may, in some conditions, behave incorrectly with S_CLAUSE instructions (hard clauses) containing more than 32 operations (that is, whose arguments exceed 0x1f). However, gfx10 targets will work successfully with clauses of up to length 63. Therefore, define the MaxHardClauseLength property on GCNSubtarget and make it a subtarget feature via tablegen, thus allowing us to specify, both now and in the future, the maximum viable size of clauses on various hardware from the tablegen definition. If MaxHardClauseLength is 0, which is the default, the hardware does not support hard clauses.	2024-02-15 13:58:31 -06:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00
Konstantin Zhuravlyov	a1df10da59	AMDGPU/NFC: Add predicate for supporting ds_add_f64 (#80379 )	2024-02-02 08:29:25 -05:00
Konstantin Zhuravlyov	4eee04585f	AMDGPU/NFC: Add predicate for supporting buffer/flat/global f64 atomics (#80209 )	2024-01-31 17:35:32 -05:00
Jay Foad	c5d59fe1b2	[AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX11.5 (#79460 ) The hardware bug only affects GFX11.0.x.	2024-01-25 16:28:49 +00:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Jay Foad	6cf37dd504	[AMDGPU] Enable architected SGPRs for GFX12 (#79160 )	2024-01-23 16:36:30 +00:00
Jay Foad	ed12388082	[AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (#78709 ) That instruction is not supported on GFX12. Added a testcase which previously crashed without this change. Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>	2024-01-19 14:36:27 +00:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	264fd9e13e	[AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439 )	2024-01-18 08:46:53 +01:00
Jay Foad	e4c8c58517	[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (#77929 )	2024-01-17 15:57:36 +00:00
Jay Foad	e9e9d1b0b1	[AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX12 (#77927 )	2024-01-17 11:52:19 +00:00
Mariusz Sikora	b61e5b0844	[AMDGPU][NFC] Add GFX numbers to DefaultComponent feature (#77894 )	2024-01-15 11:01:29 +01:00
Mariusz Sikora	2b83ceee3d	[AMDGPU][GFX12] Default component broadcast store (#76212 ) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pass only X component, it will be the same as XXXX, or XY same as XYXX. This patch simplifies the passed vector of components in InstCombine by removing components from the end that are equal to the first component. For image stores it also trims DMask if necessary. --------- Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>	2024-01-12 08:26:08 +01:00
Jay Foad	e96e7a9a86	[AMDGPU] Implement readcyclecounter for GFX12 (#76965 )	2024-01-05 08:20:52 +00:00
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Mirko Brkušanin	a278ac577e	[AMDGPU] CodeGen for SMEM instructions (#75579 )	2023-12-15 12:10:33 +01:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Mirko Brkušanin	c1a6974d6b	[AMDGPU][MC] Add GFX12 SMEM encoding (#75215 )	2023-12-15 09:00:54 +01:00
Jay Foad	1c55b227fe	[AMDGPU] Add GFX12 encoding and aliases for existing SOP (SALU) instructions (#74305 )	2023-12-05 10:07:06 +00:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Stephen Thomas	720be6c535	[AMDGPU] Add encoding/decoding support for non-result-returning ATOMIC_CSUB instructions (#68684 ) The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have encodings for non-value-returning forms, although actually using them isn't supported by hardware. However, these encodings aren't supported by the backend, meaning that they can't even be assembled or disassembled. Add support for the non-returning encodings, but gate actually using them in instruction selection behind a new feature FeatureAtomicCSubNoRtnInsts, which no target uses. This does allow the non-returning instructions to be tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover them. The feature does not gate assembling or disassembling them, this is now not an error, and encoding and decoding tests have been adapted accordingly.	2023-10-11 11:37:27 +01:00
Jay Foad	2a0ec5f1ac	[AMDGPU] Add GFX11.5 s_singleuse_vdst instruction (#67536 )	2023-09-29 15:05:13 +01:00
Mirko Brkušanin	2cd2445c21	[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets (#67461 ) In order to avoid duplicating every dpp pseudo opcode that has src1, we allow it for all opcodes and add manual checks on subtargets that do not support it.	2023-09-29 11:54:49 +02:00
Jay Foad	d85d143ad9	[AMDGPU] New image intrinsic optimizer pass (#67151 ) Implement a new pass to combine multiple image_load_2dmsaa and 2darraymsaa intrinsic calls into a single image_msaa_load if: - they refer to the same vaddr except for sample_id, - they use a constant sample_id and they fall into the same group, - they have the same dmask and the number of instructions and the number of vaddr/vdata dword transfers is reduced by the combine This should be valid on all GFX11 but a hardware bug renders it unworkable on GFX11.0.* so it is only enabled for GFX11.5. Based on a patch by Rodrigo Dominguez!	2023-09-26 09:33:49 +01:00
Ivan Kosarev	fab28e0e14	Reapply "[AMDGPU] Introduce real and keep fake True16 instructions." Reverts 6cb3866b1ce9d835402e414049478cea82427cf1. Analysis of failures on buildbots with expensive checks enabled showed that the problem was triggered by changes in another commit, 469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug addressed in #67245.	2023-09-23 22:07:41 +01:00
Ivan Kosarev	6cb3866b1c	Revert "[AMDGPU] Introduce real and keep fake True16 instructions." This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to failures on expensive checks.	2023-09-22 15:40:26 +01:00
Ivan Kosarev	0f864c7b8b	[AMDGPU] Introduce real and keep fake True16 instructions. The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionality is implemented and relevant tests are updated. Reviewed By: arsenm, Joe_Nash Differential Revision: https://reviews.llvm.org/D156101	2023-09-22 10:57:56 +01:00
Ivan Kosarev	bea56b0bc0	[AMDGPU] Have a subtarget feature to control use of real True16 instructions. Real True16 instructions are as they are defined in the ISA. Fake True16 instructions are identical to real ones except that they take 32-bit registers as operands and always use their low halves. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D156100	2023-09-22 10:47:13 +01:00
Mirko Brkušanin	923285b139	[AMDGPU] Add gfx1150 SALU Float instructions (#66884 )	2023-09-21 11:22:15 +02:00
Austin Kerbow	69447d6afe	[AMDGPU] Add ASM and MC updates for preloading kernargs Add assembler directives for preloading kernel arguments that correspond to new fields in the kernel descriptor for the length and offset of arguments that will be placed in SGPRs prior to kernel launch. Alignment of the arguments in SGPRs is equivalent to the kernarg segment when accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated directly after other user SGPRs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159459	2023-09-19 15:45:16 -07:00
Stanislav Mekhanoshin	cfe9a134bb	[AMDGPU] Rename 64BitDPP feature and fix the checks Names '64BitDPP' and especially 'DPP64' were found misleading, and DPP64 can easily be mixed with DPP16 and DPP8 while these are different concepts. DPP16 and DPP8 refers to lanes where DPP64 refers to the operand size. In fact the essential part here is that these instructions are executed on the DP ALU, so rename the feature accordingly. I have also found a bug in a check for these instructions, which is fixed here and a common utility function is now used. Differential Revision: https://reviews.llvm.org/D158465	2023-08-22 11:00:10 -07:00
Stanislav Mekhanoshin	f7480bc5c1	[AMDGPU] Decouple V_PK_MOV_B32 from FeaturePackedFP32Ops This is not an FP32 operation. Differential Revision: https://reviews.llvm.org/D157909	2023-08-14 14:22:52 -07:00
Stanislav Mekhanoshin	02046ad944	[AMDGPU] W/a for gfx940 byte0 fp8 conversion bug VOP1 form of these do not work. Differential Revision: https://reviews.llvm.org/D157683	2023-08-11 02:21:21 -07:00
Jay Foad	c2093b8504	[AMDGPU] Add target features for GDS and GWS GFX9 subtargets from GFX90A onwards lack GDS but still have GWS. Differential Revision: https://reviews.llvm.org/D156713	2023-08-02 09:02:07 +01:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
pvanhout	b6b6c8d2d7	[AMDGPU] Add more Common Feature Sets A small refactor to add more `_Common` feature sets for GFX8+. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D153843	2023-06-27 14:24:11 +02:00
Matt Arsenault	2f5a116cf7	AMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8 If we have legal f16 instructions but no f16 med3, we can save one instruction by expanding out the min/max sequence compared to casting to f32 and casting back.	2023-05-23 08:48:25 +01:00
Konstantin Zhuravlyov	42bd81410e	AMDGPU: Force sc0 and sc1 on stores for gfx940 and gfx941 Differential Revision: https://reviews.llvm.org/D149986	2023-05-12 11:53:19 -04:00
Konstantin Zhuravlyov	fae9e7d46c	AMDGPU: Factor out GFX9.4 common features into a feature set Differential Revision: https://reviews.llvm.org/D149985	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	9d05727972	AMDGPU: Add basic gfx942 target Differential Revision: https://reviews.llvm.org/D149983	2023-05-10 11:51:06 -04:00

1 2 3 4 5 ...

292 Commits