llvm-project

Author	SHA1	Message	Date
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
Matt Arsenault	8f18cf77e7	AMDGPU: Check for implicit defs before constant folding instruction Can't delete the constant folded instruction if scc is used. Fixes #63986 https://reviews.llvm.org/D157504	2023-08-11 10:29:53 -04:00
pvanhout	361e9eec51	[AMDGPU] Corrrectly emit AGPR copies in tryFoldPhiAGPR - Don't create COPY instructions between PHI nodes. - Don't create V_ACCVGPR_WRITE with operands that aren't AGPR_32 Solves SWDEV-410408 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155080	2023-07-13 08:55:22 +02:00
pvanhout	026fc9e9c4	[AMDGPU] Handle Additional Cases in tryFoldPhiAGPR Sometimes PHI have different incoming values, such as: ``` %1:vgpr_256 = COPY %0:agpr_256 %2:vgpr_32 = COPY %1:vgpr_256.sub0 ``` Those weren't handled, which could lead to massive performance issues if break-large-PHIs kicked in + AGPRs were used (MFMA) Fixes SWDEV-407986 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D153879	2023-06-29 14:49:18 +02:00
Ivan Kosarev	4e312abdfd	[AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152257	2023-06-07 11:41:11 +01:00
pvanhout	6b971325e9	[AMDGPU] Fold more AGPR copies/PHIs in SIFoldOperands Generalize `tryFoldLCSSAPhi` into `tryFoldPhiAGPR` which works on any kind of PHI node (not just LCSSA ones) and attempts to create AGPR Phis more aggressively. Also adds a GFX908-only "cleanup" function `tryOptimizeAGPRPhis` which tries to minimize AGPR to AGPR copies on GFX908, which doesn't have a ACCVGPR MOV instruction (so AGPR-AGPR copies become 2 or 3 instructions as they need a VGPR temp). The reason why this is needed is because D143731 + the new `tryFoldPhiAGPR` may create a lot more PHIs (one 32xfloat PHI becomes 32 float phis), and if each PHI hits the same AGPR (like in `test_mfma_loop_agpr_init`) they will be lowered to 32 copies from the same AGPR, which will each become 2-3 instructions. Creating a VGPR cache in this case prevents all those copies from being generated (we have AGPR-VGPR copies instead which are trivial). This is a prepation patch intended to prevent regressions in D143731 when AGPRs are involved. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144099	2023-03-28 09:33:12 +02:00
Jay Foad	e2eee902a4	[AMDGPU] Fix an assertion failure when folding into src2 of V_FMAC_F16 D139469 "[AMDGPU] Enable OMod on more VOP3 instructions" caused an assertion failure when trying to fold into src2 of V_FMAC_F16. It would temporarily convert the instruction to V_FMA_F16_gfx9 and add an opsel operand, but if the fold still failed then it would forget to remove the opsel operand. Differential Revision: https://reviews.llvm.org/D144558	2023-02-22 14:26:03 +00:00
Yashwant Singh	2a832d0f09	[AMDGPU] Add missing physical register check in SIFoldOperands::tryFoldLoad tryFoldLoad() is not meant to work on physical registers moreover use_nodbg_instructions(reg) makes the compiler buggy when called with physical reg Fix for SWDEV-373493 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141895	2023-01-24 14:24:41 +05:30
Jay Foad	073401e59c	[MC] Define and use MCInstrDesc implicit_uses and implicit_defs. NFC. The new methods return a range for easier iteration. Use them everywhere instead of getImplicitUses, getNumImplicitUses, getImplicitDefs and getNumImplicitDefs. A future patch will remove the old methods. In some use cases the new methods are less efficient because they always have to scan the whole uses/defs array to count its length, but that will be fixed in a future patch by storing the number of implicit uses/defs explicitly in MCInstrDesc. At that point there will be no need to 0-terminate the arrays. Differential Revision: https://reviews.llvm.org/D142215	2023-01-23 14:44:58 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Matt Arsenault	4463badf46	AMDGPU: Use DenormalMode type in FP mode tracking This simplies a future patch. The MIR handling should be fixed. We're still printing these in custom MachineFunctionInfo as bools (plus the inverted meaning is hard to follow).	2022-12-21 20:35:48 -05:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Joe Nash	bbfbec94b1	[AMDGPU] Enable OMod on more VOP3 instructions OMod was disabled if OpSel was enabled, but that restriction is more specific than necessary. Any VOP3 with float operands can use OMod. On GFX11, FMAC_F16_e64 can use op_sel. Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139469	2022-12-07 13:30:33 -05:00
Jay Foad	38302c60ef	[AMDGPU] Stop looking for implicit M0 uses on MOV instructions Before D114230, indirect moves used regular MOV opcodes and were identified by having an implicit use of M0. Since D114230 they use dedicated opcodes instead, so remove some old code that checks for implicit uses of M0. NFCI. Differential Revision: https://reviews.llvm.org/D138308	2022-11-18 16:57:55 +00:00
Jay Foad	49762162ea	[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands. To avoid a profusion of similar functions with undocumented differences, this patch removes all the isLiteralConstant* variants. Callers are responsible for handling the isReg case. Differential Revision: https://reviews.llvm.org/D125759	2022-11-17 16:45:48 +00:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Pierre van Houtryve	b5f9972345	[SIFoldOperands] Small code cleanups, NFC. I've been trying to understand the backend better and decided to read the code of this pass. While doing so, I noticed parts that could be refactored to be a tiny bit clearer. I tried to keep the changes minimal, a non-exhaustive list of changes is: - Stylistic changes to better fit LLVM's coding style - Removing dead/useless functions (e.g. FoldCandidate had getters, but it's a public struct!) - Saving regs/opcodes in variables if they're going to be used multiple times in the same condition Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137539	2022-11-08 07:51:48 +00:00
Pierre van Houtryve	70c781f4b6	[SIFoldOperands] Move `isFoldableCopy` into a separate helper, NFC. There was quite a bit of logic there that was just in the middle of core loop. I think it makes it easier to follow when it's split off in a separate helper like the others. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137538	2022-11-08 07:44:34 +00:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Matt Arsenault	7834194837	TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator.	2022-09-12 09:03:37 -04:00
Jay Foad	96dfa523c2	[AMDGPU] Refactor SIFoldOperands. NFC. Refactor static functions into class methods so they have access to TII, MRI etc.	2022-09-07 11:05:01 +01:00
Kazu Hirata	9861a68a7c	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 10:41:50 -07:00
Carl Ritson	dbda30e294	[AMDGPU][SIFoldOperands] Clear kills when folding COPY Clear all kill flags on source register when folding a COPY. This is necessary because the kills may now be out of order with the uses. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130622	2022-07-28 11:57:55 +09:00
Jay Foad	3eb2281bc0	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643	2022-05-18 10:19:35 +01:00
Christudasan Devadasan	6dd21d1db1	[AMDGPU][SIFoldOperands] Consider the alignment constraints Enforced an alignment check while folding the operands.	2022-03-17 08:27:53 +05:30
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Stanislav Mekhanoshin	c4500de255	[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions Differential Revision: https://reviews.llvm.org/D121634	2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin	36fe3f13a9	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254	2022-03-14 15:23:36 -07:00
Christudasan Devadasan	0d849b8249	AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users Use TII::getRegClass to return a valid regclass or a nullptr if the RC is unknown for a given OpIdx. This fixes a potential crash occurred while getting the RC from a variadic instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120813	2022-03-08 10:11:57 +05:30
Stanislav Mekhanoshin	e7b362d75d	[AMDGPU] Add v_mov_b64 gfx940 opcode Differential Revision: https://reviews.llvm.org/D121023	2022-03-07 12:07:12 -08:00
Jay Foad	69ab233a15	[AMDGPU] Return better Changed status from SIFoldOperands Differential Revision: https://reviews.llvm.org/D120023	2022-02-18 10:35:48 +00:00
Stanislav Mekhanoshin	dbf278b984	[AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read. Note that this inevitably increases register pressure to the point where some programs will become uncompilable. This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst. Fixes: SWDEV-318900 Differential Revision: https://reviews.llvm.org/D117844	2022-01-26 14:48:20 -08:00
Jack Andersen	f108c7f59d	[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852	2021-12-05 15:55:59 -05:00
Christudasan Devadasan	654c89d85a	[AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc. Also, added the missing AV register classes from 192b to 1024b. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D109300	2021-11-26 00:42:12 -05:00
Zarko Todorovski	5b8bbbecfa	[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target Reworded removed code comments that contain `sanity check` and `sanity test`.	2021-11-17 21:59:00 -05:00
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
Jay Foad	6cef28ed2d	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229	2021-09-23 08:58:46 +01:00
Mikael Holmen	e7b169a8ae	[AMDGPU] Fix gcc warnings about unused variables [NFC]	2021-09-23 08:08:00 +02:00
Jay Foad	0205806d0f	[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible VOP3- only mad/fma form. With this change, the only way we should emit VOP3-encoded mac/fmac is if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs for both src0 and src1. In all other cases the mac/fmac should either be converted to mad/fma or shrunk to VOP2 encoding. Differential Revision: https://reviews.llvm.org/D110156	2021-09-22 09:36:34 +01:00
Sebastian Neubauer	f3fe44fa05	[AMDGPU] Fix too many constants with flat scratch Prevent SIFoldOperands from creating SALU instructions with a constant and a frame index. Previously, only one operand was checked to be a frame index, leading to too many constants when flat scratch is enabled and stack offsets are large. Differential Revision: https://reviews.llvm.org/D108368	2021-08-20 08:21:36 +02:00
Matt Arsenault	39f8a792f0	AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away. We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases.	2021-06-22 13:42:49 -04:00
Jay Foad	7c706af03b	[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp First clean up the strange API of tryConstantFoldOp where it took an immediate operand value, but no indication of which operand it was the value for. Second clean up the loop that calls tryConstantFoldOp so that it does not have to restart from the beginning every time it folds an instruction. This is NFCI but there are some minor changes caused by the order in which things are folded. Differential Revision: https://reviews.llvm.org/D100031	2021-05-06 09:55:22 +01:00
Matt Arsenault	b58332774f	AMDGPU: Fix assert on inline asm on gfx90a This was assuming all mayLoad instructions have one def.	2021-04-23 09:00:25 -04:00
Matt Arsenault	987e52851e	AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies	2021-04-21 21:58:18 -04:00
Jay Foad	323ef0eb45	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Reapply with a fix for using InstToErase after it had been erased. Differential Revision: https://reviews.llvm.org/D100188	2021-04-19 12:05:41 +01:00
Mitch Phillips	3d4730a73f	Revert "[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs" This reverts commit d19a42eba98fe853dd52f7dc89d8cd2727c7fc1c. Reason: Broke the ASan buildbots. See the original phabricator review for more details: https://reviews.llvm.org/D100188	2021-04-09 15:47:44 -07:00
Jay Foad	d19a42eba9	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Differential Revision: https://reviews.llvm.org/D100188	2021-04-09 20:41:09 +01:00
Jay Foad	a4ced03d34	[AMDGPU] SIFoldOperands: eagerly delete dead copies This is cheap to implement, means less work for future passes like MachineDCE, and slightly improves the folding in some cases. Differential Revision: https://reviews.llvm.org/D100117	2021-04-09 13:52:54 +01:00
Jay Foad	a1a372dfb5	[AMDGPU] SIFoldOperands: remove an unneeded isReg check. NFC.	2021-04-08 16:37:43 +01:00

1 2 3 4

199 Commits