llvm-project

Author	SHA1	Message	Date
Carl Ritson	a3a3e6997b	[AMDGPU] Rewrite GFX12 SGPR hazard handling to dedicated pass (#118750 ) - Algorithm operates over whole IR to attempt to minimize waits. - Add support for VALU->VALU SGPR hazards via VA_SDST/VA_VCC.	2025-01-30 11:21:11 +09:00
Shoreshen	e8811ad3cc	[AMDGPU] Fix unreachable reg bit width (#122107 ) Add register class bit width for SReg_256_XNULL and SReg_128_XNULL	2025-01-22 10:05:47 +07:00
Shilei Tian	7dbd6cd294	[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357 ) If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwise, it uses the default range as a starting point. We will no longer manipulate the known range, which can cause issues because the known range is a "throttle" to the assumed range such that the assumed range can't get widened properly in `updateImpl` if the known range is not set properly for whatever reasons. Another benefit of not touching the known range is, if we indicate pessimistic state, it also invalidates the AA such that `manifest` will not be called. Since we honor the attribute, we don't want and will not add any half-baked attribute added to a function.	2024-12-11 16:47:51 -05:00
Pravin Jagtap	5e007afa9d	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589 ) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table.	2024-12-11 18:38:10 +05:30
Pravin Jagtap	2469984144	[AMDGPU][NFC] Delete duplicate decl and impl defines. (#118843 )	2024-12-05 23:36:39 +05:30
Shilei Tian	68bcba6d7a	Revert "[AMDGPU] Use COV6 by default (#118515 )" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.	2024-12-03 20:17:06 -05:00
Shilei Tian	410cbe3cf2	[AMDGPU] Use COV6 by default (#118515 )	2024-12-03 19:38:35 -05:00
Matt Arsenault	39337ff2dc	AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844 ) gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>	2024-12-02 09:23:17 -05:00
Matt Arsenault	d9c4e9ffe7	AMDGPU: Verify f8f6f4 formats in assembler (#117826 ) Verify the register widths of the corresponding operands match the floating point format expected size.	2024-11-26 23:45:03 -05:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Matt Arsenault	5a556d55fb	AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309 )	2024-11-18 10:48:56 -08:00
Joe Nash	8ed3b05582	[AMDGPU][True16][MC] Implement V_CVT_PK_F32_FP8/BF8 (#116106 ) Existing Fake16 versions of these instructions do not support op_sel on the _e32 encoding, which leaves a hole in the disassembler support. Implement the true16 version of the instructions in the MC layer.	2024-11-14 11:47:00 -05:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Brox Chen	e8644e3b47	[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436 ) Some old "t16" VOP2 instructions are actually in fake16 format. Correct and update test file	2024-11-05 16:12:49 -05:00
Matt Arsenault	0b40f97929	AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751 ) 0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field if it is the default, rather than if any element was the default. Also fix the name of the test since it didn't exactly match the real attribute name.	2024-11-05 12:50:44 -08:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Jay Foad	6f956e3117	[AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (#110242 ) Change the names of the TableGen features to match the names used by AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed by a single workgroup. Add some explanatory comments. NFC.	2024-09-30 10:29:31 +01:00
Craig Topper	fd50cdfb94	[AMDGPU] Use MCRegister. NFC	2024-09-28 11:40:25 -07:00
Scott Egerton	396f677514	[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769 )	2024-09-24 10:58:00 +01:00
Youngsuk Kim	d31e314131	[llvm] Don't call raw_string_ostream::flush() (NFC) Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b13610a5226b84889b923bae884ba395ad084d for further reference )	2024-09-20 12:19:59 -05:00
Jeffrey Byrnes	7bcf4d63cf	[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276 ) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" and "VALU op consumes result of that op" This includes the case where the second op is SDWA with same dest and dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there is an implicit read of the first op dst and the compiler needs to resolve this hazard. Confirmed with HW team. We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand, so this PR checks for that. MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue. Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 -- this PR adds support opsel[2] != 0 as well	2024-08-22 11:38:24 -07:00
Mariusz Sikora	2f89c1c76c	[AMDGPU][NFC] Remove duplicate code by using getAddressableLocalMemorySize (#104604 )	2024-08-17 08:11:03 +02:00
Ivan Kosarev	f0fe6c66cb	[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888 ) And declare it to take an MCRegister. Also rename related entities and remove a comment for the function that depending on its purpose is either irrelevant or misleading.	2024-08-14 17:04:15 +01:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Jay Foad	74b87b02d2	[AMDGPU] Fix and add namespace closing comments. NFC.	2024-07-16 16:56:31 +01:00
Matt Arsenault	b3f5c7247d	AMDGPU: Assume true in getVOPNIsSingle helpers (#98516 ) If we have something we don't know what it is, we should conservatively avoid printing an additional suffix. For isCodeGenOnly pseudoinstructions, no encoded instruction is added to the tables this is queried, and the null case would assume true. This happens to fix the case I ran into, but this isn't a wholistic fix. These really should be encoded directly in the TSFlags of the MCInstrDesc, which would allow encoding pseudos to work correctly.	2024-07-12 21:12:56 +04:00
vangthao95	3aef525aa4	[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165 ) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of https://github.com/llvm/llvm-project/pull/79553.	2024-06-24 14:18:23 -07:00
Matt Arsenault	8520061281	AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590 ) This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously requires gfx8-insts.	2024-06-18 18:34:34 +02:00
Scott Egerton	4a305d40a3	[AMDGPU] Exclude certain opcodes from being marked as single use (#91802 ) The s_singleuse_vdst instruction is used to mark regions of instructions that produce values that have only one use. Certain instructions take more than one cycle to execute, resulting in regions being incorrectly marked. This patch excludes these multi-cycle instructions from being marked as either producing single use values or consuming single use values or both depending on the instruction.	2024-06-12 10:43:23 +01:00
Janek van Oirschot	a699ccbf0c	MCExpr-ify amd_kernel_code_t (#91587 ) Redefines the amd_kernel_code_t struct with MCExprs for members that would be derived from SIProgramInfo MCExpr members.	2024-05-22 13:45:45 +01:00
Janek van Oirschot	d86b68afd7	MCExpr-ify SIProgramInfo (#88257 ) Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.	2024-05-09 13:02:32 +01:00
Emma Pilkington	dcc7ef3ce8	[AMDGPU][MC] Disable sendmsg SYSMSG_OP_HOST_TRAP_ACK on gfx9+ (#90203 ) This is no longer supported as of gfx9. Fixes #52903 This commit also includes some refactoring of sendmsg operand parsing: - Use CustomOperand for sendmsg operations, this allows them to be conditionally available based on a STI check (and automatically in sync with SIDefines.h). - Move CustomOperand table lookups from AMDGPUBaseInfo to AMDGPUAsmUtils. This cleans up an awkward interface where AMDGPUAsmUtils defined a table/size as globals that AMDGPUBaseInfo had to loop over. - Clean up a few of the operand lookup functions while moving them.	2024-05-07 07:38:58 -04:00
Emma Pilkington	607b4bc602	[AMDGPU] Add a missing COV6 case to getAMDHSACodeObjectVersion() (#87492 )	2024-04-03 15:36:58 -04:00
Joe Nash	e29228efae	[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418 ) Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on GFX1150Plus The w32 and w64 _e64_dpp assembler only real instructions were unused, and erroneously constructed in a way that bugged parsing of the new instructions. They are removed. This patch is a follow up to PR https://github.com/llvm/llvm-project/pull/87382	2024-04-03 14:51:27 -04:00
Janek van Oirschot	1103a2a337	Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr. Relands #80855 with fixes	2024-03-27 11:59:56 +00:00
Mariusz Sikora	94a550dab2	[AMDGPU][NFC] Rename Feature GFX11FullVGPRs to 1_5xVGPRs (#86468 )	2024-03-25 11:00:59 +01:00
David Stuttard	75e528fdd9	[AMDGPU] Extend zero initialization of return values for TFE (#85759 ) buffer_load instructions that use TFE also need to zero initialize return values similar to how the image instructions currently work. Add support for this with standard zero init of all results + zero init of just TFE flag when enable-prt-strict-null subtarget feature is disabled.	2024-03-25 09:01:46 +00:00
Janek van Oirschot	797336b127	Revert "[AMDGPU] MCExpr-ify MC layer kernel descriptor" (#86151 ) Reverts llvm/llvm-project#80855	2024-03-21 10:19:54 -07:00
Janek van Oirschot	857161c367	[AMDGPU] MCExpr-ify MC layer kernel descriptor (#80855 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.	2024-03-21 13:57:10 +00:00
Carl Ritson	c29b265eb9	Reapply "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 )" This reverts commit 7d508eb5d38f4bbbab4230a666d9e742e271af61.	2024-03-14 10:56:43 +09:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Diana Picus	0086cc95b3	[AMDGPU] Rename getNumVGPRBlocks. NFC (#84161 ) Rename getNumVGPRBlocks to getEncodedNumVGPRBlocks, to clarify that it's using the encoding granule. This is used to program the hardware. In practice, the hardware will use the alloc granule instead, so this patch also adds a new helper, getAllocatedNumVGPRBlocks, which can be useful when driving heuristics.	2024-03-07 12:46:42 +01:00
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.	2024-03-06 09:51:48 -05:00
Shilei Tian	e9c1dbb408	Revert "[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )" This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks downstream.	2024-03-06 08:42:54 -05:00
Shilei Tian	530f0e64ec	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )	2024-03-04 08:40:42 -05:00
Ivan Kosarev	680c780a36	[AMDGPU][AsmParser] Support structured HWREG operands. (#82805 ) Symbolic values are to be supported separately.	2024-02-28 14:44:34 +00:00
Jeffrey Byrnes	113052b2b0	[AMDGPU] Prefer lower total register usage in regions with spilling Change-Id: Ia5c434b0945bdcbc357c5e06c3164118fc91df25	2024-02-26 12:19:52 -08:00
Ivan Kosarev	dfa1d9b027	[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772 ) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions.	2024-02-23 17:34:55 +00:00

1 2 3 4 5 ...

365 Commits