llvm-project

Author	SHA1	Message	Date
Matt Arsenault	6f8e7c11cf	AMDGPU: Add MC support for gfx950 V_BITOP3_B32/B16 (#117379 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 09:42:07 -08:00
Matt Arsenault	cd20fc0772	AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410 ) This is a refinement for the existing hack. With this, the default target will have neither wavefrontsize feature present, unless it was explicitly specified. That is, getWavefrontSize() == 64 no longer implies +wavefrontsize64. getWavefrontSize() == 32 does imply +wavefrontsize32. Continue to assume the value is 64 with no wavesize feature. This maintains the codegenable property without any code that directly cares about the wavesize needing to worry about it. Introduce an isWaveSizeKnown helper to check if we know the wavesize is accurate based on having one of the features explicitly set, or a known target-cpu. I'm not sure what's going on in wave_any.s. It's testing what happens when both wavesizes are enabled, but this is treated as an error in codegen. We now treat wave32 as the winning case, so some cases that were previously printed as vcc are now vcc_lo.	2024-11-23 09:27:47 -08:00
Jay Foad	ade0750e35	[AMDGPU] Fix some cache policy checks for GFX12+ (#116396 ) Fix coding errors found by inspection and check that the swz bit still serves to prevent merging of buffer loads/stores on GFX12+.	2024-11-21 08:22:59 +00:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Kazu Hirata	4048c64306	[llvm] Remove redundant control flow statements (NFC) (#115831 ) Identified with readability-redundant-control-flow.	2024-11-12 10:09:42 -08:00
Fangrui Song	facdae62b7	[MCInstPrinter] Make printRegName non-const Similar to printInst. printRegName may change states (e.g. #113834).	2024-10-29 19:14:54 -07:00
Craig Topper	fd50cdfb94	[AMDGPU] Use MCRegister. NFC	2024-09-28 11:40:25 -07:00
Jun Wang	cd5f5b7690	[AMDGPU][MC] Implement fft and rotate modes for ds_swizzle_b32 (#108064 ) In addition to the basic mode, the ds_swizzle_b32 is supposed to support two specific modes: fft and rotate. This patch implements those two modes.	2024-09-27 10:18:34 -07:00
Matt Arsenault	d2b6a8ee67	AMDGPU: Fix asserting when trying to print scc (#101175 ) This is printable using inline assembly. Also we should handle using scc directly as instruction operands.	2024-07-30 17:41:13 +04:00
Ivan Kosarev	5bd3aef5e2	[AMDGPU] Use a generic printer for NamedIntOperands. (#100399 ) This includes simplifying printing dmask modifiers where we don't need to mask the value to print. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-07-29 09:54:05 +01:00
Ivan Kosarev	7a3bc44c89	[AMDGPU][MC][NFCI] Eliminate printU4ImmDecOperand(). (#100589 ) This is hoped to make things a bit safer not masking the value to print and to make the logic in printDPPCtrl() a bit more explicit. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-07-26 10:10:30 +01:00
Ivan Kosarev	24a18aafa3	[AMDGPU] Simplify printing row/bank_mask modifiers. (#100575 ) And fix a codegen test to use mask values that fit their encoding fields. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-07-25 16:44:44 +01:00
Ivan Kosarev	430cf6537b	[AMDGPU][NFCI] Declare offset0/1 operands to be i32. (#100560 ) Being of type i8 makes them signed, which they aren't, and requires extra work masking them on verbalisation. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-07-25 14:32:19 +01:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Ivan Kosarev	162386693f	[AMDGPU][MC] Support UC_VERSION_* constants. (#95618 ) Our other tools support them, so we want them in LLVM assembler/disassembler too.	2024-06-18 15:44:14 +01:00
Pierre van Houtryve	9ab601fa53	Reland "[NFC][AMDGPU] Do not flush after printing every instruction (#95237 )" It's very expensive and doesn't achieve anything. I one test I did, it saves almost 10s on a 2m23s build, bringing it down to 2m15s using a downstream branch.	2024-06-13 08:30:06 +02:00
pvanhout	04c4cf45fa	Revert "[NFC][AMDGPU] Do not flush after printing every instruction (#95237 )" This reverts commit ad9fe3b2a949fb3379e0a1bafbcd2ca81f5fa414.	2024-06-12 15:24:21 +02:00
Pierre van Houtryve	ad9fe3b2a9	[NFC][AMDGPU] Do not flush after printing every instruction (#95237 ) It's very expensive and doesn't achieve anything. I one test I did, it saves almost 10s on a 2m23s build, bringing it down to 2m15s using a downstream branch.	2024-06-12 14:57:41 +02:00
Stanislav Mekhanoshin	6e722bbe30	[AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244 )	2024-04-26 13:02:57 -07:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Shilei Tian	e9c1dbb408	Revert "[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )" This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks downstream.	2024-03-06 08:42:54 -05:00
Shilei Tian	530f0e64ec	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )	2024-03-04 08:40:42 -05:00
Ivan Kosarev	dfa1d9b027	[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772 ) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions.	2024-02-23 17:34:55 +00:00
Shilei Tian	46734aa1e5	[AMDGPU] Use `bf16` instead of `i16` for bfloat (#80908 ) Currently we generally use `i16` to represent `bf16` in those tablegen files. This patch is trying to use `bf16` directly. Fix #79369.	2024-02-16 15:58:30 -05:00
Mirko Brkušanin	815e0485a4	[AMDGPU][MC] Fix printing vcc(_lo) twice for VOPC DPP instrucitons (#81158 )	2024-02-12 19:01:58 +01:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Mariusz Sikora	28b7e498b6	AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892 ) Endoding is VOP3P. Tagged as deep/machine learning instructions. i32 type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32 fneg(neg_lo[2]) and f32 fabs(neg_hi[2]). --------- Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>	2024-01-18 14:00:27 +01:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Mirko Brkušanin	82e33d6203	[AMDGPU] Add VDSDIR instructions for GFX12 (#75197 )	2024-01-03 16:32:00 +01:00
Mirko Brkušanin	47615ddc84	[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193 )	2023-12-14 14:22:04 +01:00
Mirko Brkušanin	ac406b4817	[AMDGPU][MC] Add GFX12 VBUFFER encoding (#75195 )	2023-12-14 12:58:18 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00
Mirko Brkušanin	f5868cb6a6	[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062 )	2023-12-04 13:04:42 +01:00
Stanislav Mekhanoshin	ab6c3d5034	[AMDGPU] Change the representation of double literals in operands (#68740 ) A 64-bit literal can be used as a 32-bit zero or sign extended operand. In case of double zeroes are added to the low 32 bits. Currently asm parser stores only high 32 bits of a double into an operand. To support codegen as requested by the https://github.com/llvm/llvm-project/issues/67781 we need to change the representation to store a full 64-bit value so that codegen can simply add immediates to an instruction. There is some code to support compatibility with existing tests and asm kernels. We allow to use short hex strings to represent only a high 32 bit of a double value as a valid literal.	2023-10-12 14:45:45 -07:00
Ivan Kosarev	c62f208c05	[AMDGPU] Don't suppress printing the .l and .h register suffixes. We don't seem to have a use for the -amdgpu-keep-16-bit-reg-suffixes option anymore. Was introduced in <https://reviews.llvm.org/D79435>. Reviewed By: Joe_Nash, foad Differential Revision: https://reviews.llvm.org/D156102	2023-09-22 11:13:05 +01:00
Carl Ritson	6ebc179978	[AMDGPU][MC][GFX11] Always output wait_vdst and wait_exp (#66610 ) Always output values of wait_vdst and wait_exp in assembly even when they are zero. While we normally avoid outputing default/zero parameters in assembly, the values of these parameters still imply wait behaviour when zero. Outputing zero values makes the intent more obvious to human readers, and avoid any future ambiguity if we choose to change the defaults to something other than zero. Fixes #66383	2023-09-22 09:25:02 +09:00
Stanislav Mekhanoshin	cfe9a134bb	[AMDGPU] Rename 64BitDPP feature and fix the checks Names '64BitDPP' and especially 'DPP64' were found misleading, and DPP64 can easily be mixed with DPP16 and DPP8 while these are different concepts. DPP16 and DPP8 refers to lanes where DPP64 refers to the operand size. In fact the essential part here is that these instructions are executed on the DP ALU, so rename the feature accordingly. I have also found a bug in a check for these instructions, which is fixed here and a common utility function is now used. Differential Revision: https://reviews.llvm.org/D158465	2023-08-22 11:00:10 -07:00
Reid Kleckner	f86c81b2a8	[AMDGPU] Avoid CodeGen dependencies from AMDGPU/Utils and MCTargetDesc This required two substantial changes: 1. Moving a `getRegBitWidth(TargetRegisterClass)` overload out of Utils and into CodeGen 2. Passing the string function name to AMDGPUPALMetadata instead of the MachineFunction Other changes are minor or updates to accommodate the first two. See issue #64166 for more information on the layering issue. Differential Revision: https://reviews.llvm.org/D156486	2023-07-27 15:19:24 -07:00
Ivan Kosarev	7208fde09e	[AMDGPU][AsmParser][NFC] Generate printers for named-bit operands automatically. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D154433	2023-07-05 10:53:33 +01:00
Ivan Kosarev	12460cf90f	[AMDGPU][AsmParser] Simplify the implementation of SWZ operands. Those are implicit helper operands and therefore don't need any parsers or printers. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: piotr, foad Differential Revision: https://reviews.llvm.org/D154432	2023-07-05 10:45:12 +01:00
Ivan Kosarev	59fd48d71e	[AMDGPU][AsmParser][NFC] Simplify instruction operand definitions. This addresses the trivial cases that only require removing the operand classes and renaming related entities. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D153965	2023-06-29 10:51:44 +01:00
Ivan Kosarev	212af2c081	[AMDGPU][AsmParser] Refine parsing of some 32-bit instruction operands. Eliminates the need for the custom code in parseCustomOperand(). The remaining uses of NamedOperandU32 are to be addressed separately. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D150204	2023-05-19 16:54:30 +01:00
Fangrui Song	432caca39a	Simplify with hasFeature. NFC	2023-02-17 18:22:24 -08:00
Kazu Hirata	64dad4ba9a	Use llvm::bit_cast (NFC)	2023-02-14 01:22:12 -08:00
Ivan Kosarev	3d6b108a87	[AMDGPU] Remove the unused u8imm operand definition. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D142193	2023-02-07 11:48:38 +00:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Petar Avramovic	b0c1a45ba5	AMDGPU/MC: Refactor decoders. Rework decoders for float immediates decodeFPImmed creates immediate operand using register operand width, but size of created immediate should correspond to OperandType for RegisterOperand. e.g. OPW128 could be used for RegisterOperands that use v2f64 v4f32 and v8f16. Each RegisterOperands would have different OperandType and require that immediate is decoded using 64, 32 and 16 bit immediate respectively. decodeOperand_<RegClass> only provides width for register decoding, introduce decodeOperand_<RegClass>_Imm<ImmWidth> that also provides width for immediate decoding. Refactor RegisterOperands: - decoders get _Imm<ImmWidth> suffix in some cases - removed unused RegisterOperands defined via multiclass - use different RegisterOperand in a few places, new RegisterOperand's decoder corresponds to the number of bits used for operand's encoding Refactor decoder functions: - add asserts for the size of encoding that will be decoded - regroup them according to the method of decoding decodeOperand_<RegClass> (register only, no immediate) decoders can now create immediate of consistent size, use it for better diagnostic of 'invalid immediate'. Differential Revision: https://reviews.llvm.org/D142636	2023-02-01 16:52:57 +01:00
Mateja Marjanovic	f84d3dd0fd	[AMDGPU] Make flat_offset a 32-bit operand instead of 16-bits Differential Revision: https://reviews.llvm.org/D142549	2023-01-25 17:52:26 +01:00

1 2 3

131 Commits