llvm-project

Author	SHA1	Message	Date
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Jay Foad	a07584d57d	[CodeGen] Make more use of MachineOperand::getOperandNo. NFC. Differential Revision: https://reviews.llvm.org/D143252	2023-02-07 11:50:57 +00:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Jay Foad	073401e59c	[MC] Define and use MCInstrDesc implicit_uses and implicit_defs. NFC. The new methods return a range for easier iteration. Use them everywhere instead of getImplicitUses, getNumImplicitUses, getImplicitDefs and getNumImplicitDefs. A future patch will remove the old methods. In some use cases the new methods are less efficient because they always have to scan the whole uses/defs array to count its length, but that will be fixed in a future patch by storing the number of implicit uses/defs explicitly in MCInstrDesc. At that point there will be no need to 0-terminate the arrays. Differential Revision: https://reviews.llvm.org/D142215	2023-01-23 14:44:58 +00:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Jay Foad	38302c60ef	[AMDGPU] Stop looking for implicit M0 uses on MOV instructions Before D114230, indirect moves used regular MOV opcodes and were identified by having an implicit use of M0. Since D114230 they use dedicated opcodes instead, so remove some old code that checks for implicit uses of M0. NFCI. Differential Revision: https://reviews.llvm.org/D138308	2022-11-18 16:57:55 +00:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Jay Foad	2e8863b6a1	[AMDGPU] Don't shrink VOP3 instructions pre-RA on GFX10+ In GFX10, there is no advantage to shrinking these instructions pre-RA, so this just saves a bit of work. In GFX11 there is an advantage to not shrinking them pre-RA, because the register classes for 16-bit operands are less restrictive in the VOP3 form than in the shrunk form. This patch is a prerequisite for actually setting up those register classes correctly for 16-bit vs non-16-bit operands. Differential Revision: https://reviews.llvm.org/D133769	2022-09-13 20:26:08 +01:00
Jay Foad	afa0ed33df	[AMDGPU] Fix shrinking of F16 FMA on newer subtargets D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of the opcode which is used on GFX9+. Differential Revision: https://reviews.llvm.org/D133489	2022-09-08 16:41:04 +01:00
Jay Foad	c155a944fb	[AMDGPU] GFX11 CodeGen support for MIMG instructions This includes: - New llvm.amdgcn.image.msaa.load.* intrinsics - NSA changes, because MIMG-NSA is now limited to 3 dwords - Split CD forms of IMAGE_SAMPLE instructions out into separate test files since they are no longer supported in GFX11 Differential Revision: https://reviews.llvm.org/D127837	2022-06-16 18:23:14 +01:00
David Stuttard	77851cc1cf	[AMDGPU] Change use null for dead sdst to be gfx1030+ Pre gfx1030 null for sdst is different. c97436f8b6e2 [AMDGPU] Use null for dead sdst operand - requires a change to make it not apply to pre gfx1030 Differential Revision: https://reviews.llvm.org/D127869	2022-06-16 10:39:06 +01:00
Stanislav Mekhanoshin	c97436f8b6	[AMDGPU] Use null for dead sdst operand Differential Revision: https://reviews.llvm.org/D127542	2022-06-13 14:41:40 -07:00
Jay Foad	e2926501d8	[AMDGPU] Aggressively fold immediates in SIShrinkInstructions Fold immediates regardless of how many uses they have. This is expected to increase overall code size, but decrease register usage. Differential Revision: https://reviews.llvm.org/D114644	2022-05-18 11:04:33 +01:00
Jay Foad	dd12c3433e	[AMDGPU] Shrink F16 MAD/FMA to MADAK/MADMK/FMAAK/FMAMK on GFX10 Differential Revision: https://reviews.llvm.org/D125803	2022-05-18 10:00:06 +01:00
Jay Foad	27fa41583f	[AMDGPU] Shrink MAD/FMA to MADAK/MADMK/FMAAK/FMAMK on GFX10 On GFX10 VOP3 instructions can have a literal operand, so the conversion from VOP3 MAD/FMA to VOP2 MADAK/MADMK/FMAAK/FMAMK will not happen in SIFoldOperands. The only benefit of the VOP2 form is code size, so do it in SIShrinkInstructions instead. Differential Revision: https://reviews.llvm.org/D125567	2022-05-16 15:15:23 +01:00
Jay Foad	c1af2d329f	[AMDGPU] SIShrinkInstructions: change static functions to methods This is a mechanical change to avoid passing MRI and TII around explicitly. NFC. Differential Revision: https://reviews.llvm.org/D125566	2022-05-16 09:43:41 +01:00
Thomas Symalla	718aec209c	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Same as D119696 including a buildbot and MIR test fix. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D122332	2022-03-25 11:40:18 +01:00
Thomas Symalla	7de6107dce	Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3." This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64. Differential Revision: https://reviews.llvm.org/D122117	2022-03-21 09:50:44 +01:00
Thomas Symalla	011c64191e	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Reviewed By: sebastian-ne, critson Differential Revision: https://reviews.llvm.org/D119696	2022-03-21 09:31:59 +01:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Jay Foad	476bb2d94e	[AMDGPU] Remove dead code from shrinkScalarLogicOp It looks like this code has been dead since shrinkScalarLogicOp was introduced in svn r348601.	2022-02-09 17:07:12 +00:00
Jay Foad	16de2c09dd	[AMDGPU] SIShrinkInstructions: sink code to where it's used. NFC.	2021-12-13 14:46:40 +00:00
Jay Foad	63681527ee	[AMDGPU] SIShrinkInstructions: remove redundant check canShrink already calls hasVALU32BitEncoding, so there is no need to call it again here.	2021-12-13 14:46:40 +00:00
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
Jay Foad	74cd4dee20	[AMDGPU] Preserve deadness of vcc when shrinking instructions This doesn't have any effect on codegen now, but it might do in the future if we shrink instructions before post-RA scheduling, which is sensitive to live vs dead defs. Differential Revision: https://reviews.llvm.org/D112305	2021-10-22 14:22:24 +01:00
Carl Ritson	6efb3220b4	[AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr. Previously these were rounded up to VReg_256 this saves VGPRs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103800	2021-07-22 10:42:15 +09:00
Carl Ritson	f8816c7400	[AMDGPU] Add v5f32/VReg_160 support for MIMG instructions Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are required for a MIMG address operand. Maintain _V8 instruction variants of pseudo instructions allowing assembly prior to GFX10 to work as-is. Currently the validator can tell for GFX10 what the correct size is, so will disallow oversize address registers. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103672	2021-06-08 11:11:40 +09:00
Jay Foad	16d707e656	[AMDGPU] Fix v_swap_b32 formation on physical registers As explained in the comments, matchSwap matches: // mov t, x // mov x, y // mov y, t and turns it into: // mov t, x (t is potentially dead and move eliminated) // v_swap_b32 x, y On physical registers we don't have full use-def chains so the check for T being live-out was not working properly with subregs/superregs. Differential Revision: https://reviews.llvm.org/D101546	2021-04-29 20:53:40 +01:00
Piotr Sobczak	fc8e741121	[AMDGPU] Avoid an illegal operand in si-shrink-instructions Before the patch it was possible to trigger a constant bus violation when folding immediates into a shrunk instruction. The patch adds a check to enforce the legality of the new operand. Differential Revision: https://reviews.llvm.org/D95527	2021-01-28 08:49:21 +01:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Stanislav Mekhanoshin	ad8131bb03	[AMDGPU] Fix VC warning about singed/unsigned comparison. NFC. This is the warning reported in https://reviews.llvm.org/D89599	2020-10-26 11:55:57 -07:00
Stanislav Mekhanoshin	611959f004	[AMDGPU] Fixed v_swap_b32 match 1. Fixed liveness issue with implicit kills. 2. Fixed potential problem with an indirect mov. Fixes: SWDEV-256848 Differential Revision: https://reviews.llvm.org/D89599	2020-10-21 10:14:24 -07:00
Austin Kerbow	ebdcef20ce	[AMDGPU] Avoid inserting noops during scheduling Passes that are run after the post-RA scheduler may insert instructions like waitcnt which eliminate the need for certain noops. After this patch the scheduler is still aware of possible latency from hazards but noops will not be inserted until the dedicated hazard recognizer pass is run. Depends on D89753. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D89754	2020-10-20 17:11:36 -07:00
Stanislav Mekhanoshin	91f503c3af	[AMDGPU] gfx1030 RT support Differential Revision: https://reviews.llvm.org/D87782	2020-09-16 11:40:58 -07:00
Michael Liao	bf41c4d29e	[codegen] Ensure target flags are cleared/set properly. NFC. - When an operand is changed into an immediate value or like, ensure their target flags being cleared or set properly. Differential Revision: https://reviews.llvm.org/D87109	2020-09-03 18:37:39 -04:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Matt Arsenault	3e52667433	AMDGPU: Fix verifier error with undef source producing s_bitset* This needs to preserve the undef flag.	2020-08-05 14:42:20 -04:00
Jay Foad	1658b8d7dd	[AMDGPU] Avoid using s_cmpk when src0 is not register The hardware spec require src0 of s_cmpk should be a register. So, we should not optimize s_cmp to s_cmpk if src0 is not register. Patch by Ruiling Song!	2020-07-14 09:05:53 +01:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Matt Arsenault	f596ab4066	AMDGPU: Use early return	2020-04-07 13:48:00 -04:00
Changpeng Fang	6370c7c13e	AMDGPU: Limit the search in finding the instruction pattern for v_swap generation. Summary: Current implementation of matchSwap in SIShrinkInstructions searches the entire use_nodbg_operands set to find the possible pattern to generate v_swap instruction. This approach will lead to a O(N^3) in compile time for SIShrinkInstructions. But in reality, the matching pattern only exists within nearby instructions in the same basic block. This work limits the search to a maximum of 16 instructions, and has a linear compile time comsumption. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D74180	2020-02-07 11:06:33 -08:00
Stanislav Mekhanoshin	cacc3b7a55	[AMDGPU] Cleanup assumptions about generated subregs We are using countPopulation on a LaneBitmask to determine a number of registers it covers. This is the assumption which does not necessarily need to be true. It is not changed but factored into a single call SIRegisterInfo::getNumCoveredRegs(). Some other places are cleaned up with respect to assumptions about subreg indexes values and tablegen behavior. Differential Revision: https://reviews.llvm.org/D74177	2020-02-06 17:39:24 -08:00
Stanislav Mekhanoshin	2863c26968	Revert "AMDGPU: Limit the search in finding the instruction pattern for v_swap generation." This reverts commit 982780648124243131c6617c0d97fc1cb02d4e75.	2020-02-06 17:38:55 -08:00
Changpeng Fang	9827806481	AMDGPU: Limit the search in finding the instruction pattern for v_swap generation. Summary: Current implementation of matchSwap in SIShrinkInstructions searches the entire use_nodbg_operands set to find the possible pattern to generate v_swap instruction. This approach will lead to a O(N^3) in compile time for SIShrinkInstructions. But in reality, the matching pattern only exists within nearby instructions in the same basic block. This work limits the search to a maximum of 16 instructions, and has a linear compile time comsumption. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D74180	2020-02-06 16:40:21 -08:00
Matt Arsenault	edca9ac0de	AMDGPU: Don't fold S_NOPs with implicit operands	2019-10-30 14:40:56 -07:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Daniel Sanders	2bea69bf65	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633	2019-08-01 23:27:28 +00:00
Nicolai Haehnle	2710171a15	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297	2019-06-25 11:52:30 +00:00

1 2 3

108 Commits