llvm-project

Author	SHA1	Message	Date
Jay Foad	c246b7bd4a	[AMDGPU] Only count global-to-global as indirect accesses Previously any load (global, local or constant) feeding into a global load or store would be counted as an indirect access. This patch only counts global loads feeding into a global load or store. The rationale is that the latency for global loads is generally much larger than the other kinds. As a side effect this makes it easier to write small kernels test cases that are not counted as having indirect accesses, despite the fact that arguments to the kernel are accessed with an SMEM load. Differential Revision: https://reviews.llvm.org/D122804	2022-04-01 13:48:13 +01:00
Thomas Symalla	1a6aa8b195	[AMDGPU] Add missing use check in SIOptimizeExecMasking pass. Whenever a v_cmp, s_and_saveexec instruction sequence shall be transformed to an equivalent s_mov, v_cmpx sequence, it needs to be detected if the v_cmp target register is used between the two instructions as the v_cmp result gets omitted by using the v_cmpx instruction, resulting in invalid code. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D122797	2022-03-31 19:25:35 +02:00
Abinav Puthan Purayil	898d5776ec	[AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer Differential Revision: https://reviews.llvm.org/D122803	2022-03-31 21:46:34 +05:30
Changpeng Fang	1711020c37	AMDGPU: Use isLiteralConstantLike to check whether the operand could ever be literal Summary: To compute the size of a VALU/SALU instruction, we need to check whether an operand could ever be literal. Previously isLiteralConstant was used, which missed cases like global variables or external symbols. These misses lead to under-estimation of the instruction size and branch offset, and thus incorrectly skip the necessary branch relaxation when the branch offset is actually greater than what the branch bits can hold. In this work, we use isLiteralConstantLike to check the operands. It maybe conservative, but it is safe. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D122778	2022-03-31 08:06:31 -07:00
Abinav Puthan Purayil	acf83abcbf	[AMDGPU][GlobalISel] Remove unused variable. NFC.	2022-03-31 16:50:34 +05:30
Stanislav Mekhanoshin	f311f934e1	[AMDGPU] gfx940 VALU hazard recognizer Differntial Revision: https://reviews.llvm.org/D122339	2022-03-29 10:57:54 -07:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
Changpeng Fang	8384ced974	[AMDGPU][NFC]: Remove unnecessary MFI functions Summary: hasHostcallPtr() and hasHeapPtr() are only used in metadata emit. However, we can use the corresponding function attributes directly instead introducing the functions. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D122600	2022-03-28 12:13:33 -07:00
Thomas Symalla	3bd15c03c6	[AMDGPU] Fix adding modifiers when creating v_cmpx instructions. Revision https://reviews.llvm.org/D122332 added a pattern transformation where v_cmpx instructions are introduced. However, the modifiers are not correctly inherited from the original operands. The patch adds the source modifiers, if they are exist, or sets them to 0. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D122489	2022-03-28 17:52:53 +02:00
Carl Ritson	1f52d02ceb	[AMDGPU] Split waterfall loop exec manipulation Split waterfall loops into multiple blocks so that exec mask manipulation (s_and_saveexec) does not occur in the middle of a block. VGPR live range optimizer is updated to handle waterfall loops spanning multiple blocks. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D122200	2022-03-28 17:44:54 +09:00
Kazu Hirata	6212871968	[Target] Apply clang-tidy fixes for readability-redundant-member-init (NFC)	2022-03-27 22:22:37 -07:00
Maksim Panchenko	4ae9745af1	[Disassember][NFCI] Use strong type for instruction decoder All LLVM backends use MCDisassembler as a base class for their instruction decoders. Use "const MCDisassembler " for the decoder instead of "const void ". Remove unnecessary static casts. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D122245	2022-03-25 18:53:59 -07:00
Jay Foad	be9acee059	[AMDGPU] Move VOP3 classes into VOPInstructions.td. NFC. These classes are also used by VOP1/2/C instructions. Differential Revision: https://reviews.llvm.org/D122470	2022-03-25 13:56:43 +00:00
Thomas Symalla	718aec209c	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Same as D119696 including a buildbot and MIR test fix. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D122332	2022-03-25 11:40:18 +01:00
Stanislav Mekhanoshin	64838ba365	[AMDGPU] Use GenericTable to classify DGEMM Since there is a table introduced for MAI instructions extend it to use for DGEMM classification. Differential Revision: https://reviews.llvm.org/D122337	2022-03-24 13:00:37 -07:00
Stanislav Mekhanoshin	cad9de71d7	[AMDGPU] gfx940 MAI hazard recognizer Differential Revision: https://reviews.llvm.org/D122263	2022-03-24 12:59:52 -07:00
Stanislav Mekhanoshin	6e3e14f600	[AMDGPU] Support gfx940 smfmac instructions Differential Revision: https://reviews.llvm.org/D122191	2022-03-24 12:40:42 -07:00
Stanislav Mekhanoshin	27439a7642	[AMDGPU] New gfx940 mfma instructions Differential Revision: https://reviews.llvm.org/D122044	2022-03-24 12:12:52 -07:00
Vasileios Porpodas	39aa202aff	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit e6ead19b774718113007ecb1a4449d7af0cbcfeb.	2022-03-23 18:32:17 -07:00
Austin Kerbow	1e15adba62	[AMDGPU] Add s_nop WaitStates between neighboring mfma In some cases padding bubbles between sequential MFMA instructions may lead to increased inter-wave performance. Add option to request to pad some portion of these stall cycles with s_nops. Fixes: SWDEV-326925 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D121437	2022-03-23 13:56:09 -07:00
Arthur Eubanks	e6ead19b77	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash." This reverts commit 27bd8f94928201f87f6b659fc2228efd539e8245. Causes crashes, see comments in D121973	2022-03-23 10:57:45 -07:00
hsmahesha	f5b6866d7e	[AMDGPU] Add missing testcase for SGPR to AGPR copy and, also update the function indirectCopyToAGPR() to ensure that it is called only on GFX908 sub-target. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D122286	2022-03-23 21:38:04 +05:30
hsmahesha	f014303e2c	[AMDGPU] [NFC]: Organize the code around reserving registers. First, add code to reserve all required special purpose registers, followed by code to reserve SGPRs, followed by code to reserve VGPRs/AGPRs. This patch is prepared as a pre-requisite to fix an issue related to GFX90A hardware. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122219	2022-03-23 07:15:59 +05:30
Vasileios Porpodas	27bd8f9492	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit f7d7d2a08d16356c57f6d2d36bc2fc0589a55df9.	2022-03-22 16:41:55 -07:00
Stanislav Mekhanoshin	72c1a0d9c2	[AMDGPU] Allow v_accvgpr_write to use SGPR on gfx90a This is undocumented, but it should work. Differential Revision: https://reviews.llvm.org/D122252	2022-03-22 13:52:29 -07:00
Arthur Eubanks	f7d7d2a08d	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads."" This reverts commit 79613185d305013de743cdbd6690e4d77c8af27e. Causes crashes, see comments in https://reviews.llvm.org/D121973.	2022-03-22 13:33:49 -07:00
alex-t	7636c9a929	[AMDGPU] use scalar shift for SALU users in frame index elimination In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of the stack object user kind and use scalar shift/add for scalar users. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D121524	2022-03-22 13:16:24 +01:00
alex-t	0a488cba2c	[AMDGPU] use scalar shift for SALU users in frame index elimination In the frame index lowering we have to insert shift and add instructions to adjust stack object access. We need to take care of the stack object user kind and use scalar shift/add for scalar users. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D121524	2022-03-22 11:43:23 +01:00
Vasileios Porpodas	79613185d3	Recommit "[SLP] Fix lookahead operand reordering for splat loads." Original review: https://reviews.llvm.org/D121354 The original commit 9136145eb019e1d18c966d4d06a3df349b88cc14 broke the build on several targets. Differential Revision: https://reviews.llvm.org/D121973	2022-03-21 15:57:32 -07:00
Stanislav Mekhanoshin	9b1fa6f89f	[AMDGPU] Fix AV classes VTs. NFCI. NFC at this point, but will be used at a later patch. Differential Revision: https://reviews.llvm.org/D122174	2022-03-21 13:38:05 -07:00
alex-t	a0ea7ec90f	[AMDGPU] divergence patterns for the BUILD_VECTOR i16, undef expansion. BUILD_VECTOR of i16 and undef gets expanded to the COPY_TO_REGCLASS. The latter is further lowererd to the copy instructions. We need to provide the correct register class for the uniform and divergent BUILD_VECTOR nodes to avoid VGPR to SGPR copies. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D122068	2022-03-21 21:11:20 +01:00
Dmitry Preobrazhensky	1d817a1448	[AMDGPU][MC][NFC] Refactored sendmsg(...) handling Differential Revision: https://reviews.llvm.org/D121995	2022-03-21 15:37:30 +03:00
Jay Foad	321d3aae7c	[AMDGPU] SIInstrInfo::verifyInstruction tweaks. NFCI. Simplify some for loops. Don't bother checking src2 operand for writelane because it doesn't have one. Check all VALU instructions, not just VOP1/2/3/C/SDWA.	2022-03-21 11:15:55 +00:00
Thomas Symalla	7de6107dce	Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3." This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64. Differential Revision: https://reviews.llvm.org/D122117	2022-03-21 09:50:44 +01:00
Thomas Symalla	e725e2afe0	[AMDGPU] [NFC] Fix missing include.	2022-03-21 09:37:22 +01:00
Thomas Symalla	011c64191e	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Reviewed By: sebastian-ne, critson Differential Revision: https://reviews.llvm.org/D119696	2022-03-21 09:31:59 +01:00
Jon Chesterfield	bcbd4cf1f2	Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b.	2022-03-20 00:58:26 +00:00
Jon Chesterfield	98e474c1b3	[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal	2022-03-19 16:42:17 +00:00
Stanislav Mekhanoshin	e9a49c6483	[AMDGPU] gfx940 basic speed model This is incomplete and will handle more instructions as they are added. Differential Revision: https://reviews.llvm.org/D121966	2022-03-18 13:19:47 -07:00
Stanislav Mekhanoshin	4570527e72	[AMDGPU] Disable some MFMA instructions on gfx940 Differential Revision: https://reviews.llvm.org/D121956	2022-03-18 13:19:12 -07:00
Stanislav Mekhanoshin	0a79e1f30a	[AMDGPU] reuse blgp as neg in 2 mfma operations on gfx940 GFX940 repurposes BLGP as NEG only in DGEMM MFMA. Differential Revision: https://reviews.llvm.org/D121745	2022-03-18 12:56:51 -07:00
Abinav Puthan Purayil	aee3684995	[AMDGPU] Use COPY_TO_REGCLASS for buffer_atomic_cmpswap selection GlobalISel was selecting the av_* regclass for some cases. Differential Revision: https://reviews.llvm.org/D121933	2022-03-18 08:56:23 +05:30
Changpeng Fang	dd5895cc39	AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265	2022-03-17 14:12:36 -07:00
Stanislav Mekhanoshin	d9ac55fab2	[AMDGPU] New MFMA names for existing instructions Old names are supported as aliases. _1k MFMA got new opcodes. Differential Revision: https://reviews.llvm.org/D121741	2022-03-17 13:05:36 -07:00
Stanislav Mekhanoshin	522b259976	[AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940 Differential Revision: https://reviews.llvm.org/D121843	2022-03-17 12:12:06 -07:00
Vang Thao	27e1931508	[AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these. Differential Revision: https://reviews.llvm.org/D121874	2022-03-17 11:38:53 -07:00
Jay Foad	313f306b26	[AMDGPU] Stop using getMinimalPhysRegClass in LowerFormalArguments NFCI. The motivation for this is avoid problems in future if we add new classes containing only a subset of all VGPRs, or a subset of all SGPRs. getMinimalPhysRegClass would favour these smaller classes, which is not what we want here. Differential Revision: https://reviews.llvm.org/D121914	2022-03-17 15:19:17 +00:00
Dmitry Preobrazhensky	9c632b61eb	[AMDGPU][MC] A fix for commit 5977dfb The commit code `5977dfba64` failed to compile with GCC5. This patch addresses the issue. For a related discussion, see https://reviews.llvm.org/D121696	2022-03-17 14:41:21 +03:00
Abinav Puthan Purayil	f59cb41ba1	[AMDGPU] Select buffer_atomic_cmpswap* in tblgen This change replaces the manual selection of buffer_atomic_cmpswap* instructions in SelectionDAG and GlobalISel with a tblgen based selection in BUFInstructions.td. This allows us to select the return and no-return variants in tblgen. Differential Revision: https://reviews.llvm.org/D121770	2022-03-17 10:12:32 +05:30
Christudasan Devadasan	6dd21d1db1	[AMDGPU][SIFoldOperands] Consider the alignment constraints Enforced an alignment check while folding the operands.	2022-03-17 08:27:53 +05:30

1 2 3 4 5 ...

6770 Commits