llvm-project

Author	SHA1	Message	Date
Jeffrey Byrnes	1f08d3bc3a	[AMDGPU] Further reduce attaching of implicit operands to spills Extension of https://reviews.llvm.org/D141101 to even further reduce the amount of implicit operands we attach. The main benefit is to improve cability of post-ra scheduler, and reduce unneeded dependency resolution (e.g. inserting snops). Unfortunately, we run into regressions if we completely minimize the amount implicit operands (naively), we run into some regressions (e.g. dual_movs are replaced with multiple calls to v_mov). This is even more reason to switch to LiveRegUnits. Nonetheless, this patch removes the operands which we can for free (more or less). Change-Id: Ib4f409202b36bdbc59eed615bc2d19fa8bd8c057 Differential Revision: https://reviews.llvm.org/D141557 Change-Id: I8b039e3c0d39436b384083f8beb947ee1b1730b2	2023-01-19 14:31:07 -08:00
Jay Foad	f460c66581	[AMDGPU] Simplify getNumFlatOffsetBits. NFC. Previously we considered this field to be either N-bit unsigned or N+1-bit signed, depending on the instruction. I think it's conceptually simpler to say that the field is always N+1-bit signed, but some instructions do not allow negative values. Differential Revision: https://reviews.llvm.org/D140883	2023-01-12 10:40:36 +00:00
Ruiling Song	cce24b6af0	AMDGPU: Remove IsSourceOfDivergence check This bit is not set/reserved in td file. Let's remove it for now, we can always add it back if we need it. Reviewed by: foad Differential Revision: https://reviews.llvm.org/D141223	2023-01-11 09:59:35 +08:00
Jeffrey Byrnes	596c558155	[AMDGPU] More selectively attach implicit operands to agpr spills Implicit def operands are needed when we spill partially undef super registers by each individual subregister. The implicit-def operands will allow us to lower spills without the verifier complaining. Currently, we are overzeously attaching implicit operands, when we really only need them on the first sub reg spill op. By more selectively attached the implicit ops, we will free up some unneeded dependencies for the post-ra scheduler. Moreover, this enables a previously incorrect optimization / resolves a correctness issue in indirectCopyToAGPR. When lowering AGPR copies on GFX908, we can improve CodeGen by reusing accvgpr_writes. However, we could not reliably determine which agprs accvgpr_writes actually define due to implicit-defs. Differential Revision: https://reviews.llvm.org/D141101	2023-01-09 15:10:06 -08:00
Ivan Kosarev	2d945ef864	[AMDGPU][NFC] Rename GFX10A16 operands. They do not seem to be GFX10-specific anymore. Also renames the corresponding feature. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D141069	2023-01-09 17:18:46 +00:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Christudasan Devadasan	a3028239a7	Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs" This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.	2022-12-21 16:17:42 +05:30
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Sameer Sahasrabuddhe	475ce4c200	RFC: Uniformity Analysis for Irreducible Control Flow Uniformity analysis is a generalization of divergence analysis to include irreducible control flow: 1. The proposed spec presents a notion of "maximal convergence" that captures the existing convention of converging threads at the headers of natual loops. 2. Maximal convergence is then extended to irreducible cycles. The identity of irreducible cycles is determined by the choices made in a depth-first traversal of the control flow graph. Uniformity analysis uses criteria that depend only on closed paths and not cycles, to determine maximal convergence. This makes it a conservative analysis that is independent of the effect of DFS on CycleInfo. 3. The analysis is implemented as a template that can be instantiated for both LLVM IR and Machine IR. Validation: - passes existing tests for divergence analysis - passes new tests with irreducible control flow - passes equivalent tests in MIR and GMIR Based on concepts originally outlined by Nicolai Haehnle <nicolai.haehnle@amd.com> With contributions from Ruiling Song <ruiling.song@amd.com> and Jay Foad <jay.foad@amd.com>. Support for GMIR and lit tests for GMIR/MIR added by Yashwant Singh <yashwant.singh@amd.com>. Differential Revision: https://reviews.llvm.org/D130746	2022-12-20 07:22:24 +05:30
Christudasan Devadasan	40ba0942e2	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case. This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124196	2022-12-17 11:56:32 +05:30
Christudasan Devadasan	b5efec4b27	[CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot With D134950, targets get notified when a virtual register is created and/or cloned. Targets can do the needful with the delegate callback. AMDGPU propagates the virtual register flags maintained in the target file itself. They are useful to identify a certain type of machine operands while inserting spill stores and reloads. Since RegAllocFast spills the physical register itself, there is no way its virtual register can be mapped back to retrieve the flags. It can be solved by passing the virtual register as an additional argument. This argument has no use when the spill interfaces are called during the greedy allocator or even the PrologEpilogInserter and can pass a null register in such cases. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138656	2022-12-17 11:55:34 +05:30
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Joe Nash	bbfbec94b1	[AMDGPU] Enable OMod on more VOP3 instructions OMod was disabled if OpSel was enabled, but that restriction is more specific than necessary. Any VOP3 with float operands can use OMod. On GFX11, FMAC_F16_e64 can use op_sel. Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139469	2022-12-07 13:30:33 -05:00
Piotr Sobczak	1e3abd82b9	[AMDGPU] Fix wide spills Update spill code to account for new vector types with bit widths: 288, 320, 352, 384. Related to D138205. Differential Revision: https://reviews.llvm.org/D139203	2022-12-07 10:23:52 +01:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Jay Foad	49762162ea	[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands. To avoid a profusion of similar functions with undocumented differences, this patch removes all the isLiteralConstant* variants. Callers are responsible for handling the isReg case. Differential Revision: https://reviews.llvm.org/D125759	2022-11-17 16:45:48 +00:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Jay Foad	ea60545b0e	[AMDGPU] Create new instructions in SIInstrInfo::moveToVALU Create new VALU instructions in moveToVALU instead of mutating the existing SALU instruction. This makes it easier to add extra operands so we can convert to the VOP3 form of VALU instructions. NFCI but it does have the minor side effect of removing duplicate implicit operands that were present on the original SALU if they are default implicit operands for the VALU. Differential Revision: https://reviews.llvm.org/D137324	2022-11-04 07:21:11 +00:00
Kazu Hirata	b0e0cdf1c7	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:7383: warning: enumerated mismatch in conditional expression: ‘llvm::AMDGPU::UfmtGFX11::UnifiedFormat’ vs ‘llvm::AMDGPU::UfmtGFX10::UnifiedFormat’	2022-10-30 13:09:59 -07:00
Jay Foad	9bb1e21f07	[AMDGPU] Clean up calls to MachineOperand::setIsDead and friends. NFC.	2022-10-28 10:44:08 +01:00
Carl Ritson	a3646ec1bc	[AMDGPU] Add pseudo wavemode to optimize strict_wqm Strict WQM does not require a WQM transistion if it occurs within an existing WQM section. This occurs heavily in GFX11 pixel shaders with LDS_PARAM_LOAD. Which leads to unnecessary EXEC mask manipulation. To avoid these transitions, detect WQM -> Strict WQM -> WQM and substitute new ENTER_PSEUDO_WM/EXIT_PSEUDO_WM markers instead. These are treat similarly by WWM register pre-allocation pass, but do not manipulate EXEC or use registers to save EXEC state. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D136813	2022-10-28 09:45:17 +09:00
Jay Foad	191d70f2f5	[AMDGPU] Use Register in more places in SIInstrInfo. NFC. Also avoid using AMDGPU::NoRegister when it's not neeeded.	2022-10-25 15:04:58 +01:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Ruiling Song	0404aafbe3	AMDGPU: Factor out hasDivergentBranch(). NFC This is helpful for detecting whether a block ends with divergent branch in passes before lowering the pseudo control flow instructions. Differential Revision: https://reviews.llvm.org/D133184	2022-09-14 13:27:21 +08:00
Matt Arsenault	7834194837	TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator.	2022-09-12 09:03:37 -04:00
Jay Foad	afa0ed33df	[AMDGPU] Fix shrinking of F16 FMA on newer subtargets D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of the opcode which is used on GFX9+. Differential Revision: https://reviews.llvm.org/D133489	2022-09-08 16:41:04 +01:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	d0ec61c9ff	[Target] Remove unused forward declarations (NFC)	2022-08-07 00:16:16 -07:00
Jay Foad	c24d68fff1	[AMDGPU] Take advantage of VOP3 literals in convertToThreeAddress This improves a corner case where v_fmac can be converted to v_fma on GFX10+ even if it has a literal operand. Differential Revision: https://reviews.llvm.org/D130992	2022-08-02 17:27:11 +01:00
Stanislav Mekhanoshin	68901fdbeb	[AMDGPU] Consider S_SETPRIO a scheduling boundary The instruction is used to modify wave priority with the intent to affect VALU execution and currently we can reschedule VALU around it since that VALU does not have side effects. Differential Revision: https://reviews.llvm.org/D130654	2022-07-27 11:50:23 -07:00
Joe Nash	b28bb8cc9c	[AMDGPU] Remove old operand from VOPC DPP For most DPP instructions, the old operand stores the value that was in the current lane before the DPP operation, and is tied to the destination. For VOPC DPP, this is unnecessary and incorrect. There appears to have been a latent bug related to D122737 with SIInstrInfo::isOperandLegal. If you checked if a register operand was legal when the InstructionDesc expected an immediate, it reported that is valid. Its fix is necessary for and tested in this patch. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D130040	2022-07-19 09:35:05 -04:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Ivan Kosarev	432cbd7827	[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129381	2022-07-18 11:29:31 +01:00
Jay Foad	e45aa230ad	[AMDGPU] Update LiveVariables after killing an immediate def D114999 added code to kill an immediate def if it was folded into its only use by convertToThreeAddress. This patch updates LiveVariables when that happens in order to fix verification failures exposed by D129213. Differential Revision: https://reviews.llvm.org/D129661	2022-07-14 10:49:41 +01:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Matt Arsenault	d342d130da	AMDGPU: Use isMeta flags on pseudoinstructions	2022-06-29 10:31:29 -04:00
Stanislav Mekhanoshin	21895c6b50	[AMDGPU] Relax verification of soffset in scalar stores It must use m0 only on GFX8. Later chips can use ang SGPR. Differential Revision: https://reviews.llvm.org/D128765	2022-06-28 16:10:08 -07:00
Joe Nash	f1cfaa956d	[AMDGPU] Use GFX11 S_PACK_HL instruction in more cases Differential Revision: https://reviews.llvm.org/D128527	2022-06-28 14:35:19 +01:00
Austin Kerbow	bd9eed3aec	[AMDGPU] Add isMFMA helper function. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127124	2022-06-14 22:01:49 -07:00
Stanislav Mekhanoshin	cb9ae93712	[AMDGPU] Define SGPR_NULL64 register. NFCI. On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen. Differential Revision: https://reviews.llvm.org/D127527	2022-06-13 13:23:33 -07:00
Stanislav Mekhanoshin	0f81830632	[AMDGPU] Make temp vgpr selection stable in indirectCopyToAGPR This uses rotating reminder of division by 3 to select another temp vgpr each next time in a sequence of several agpr copies. Therefore, temp vgpr selection depends on the generated agpr number. This number could change with any unrelated change to the register definitions. Stabilize the selection by using a real agpr number. Differential Revision: https://reviews.llvm.org/D127524	2022-06-13 09:39:46 -07:00
Matt Arsenault	0e1c71e4a4	CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function. Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.	2022-06-01 09:45:40 -04:00
Stanislav Mekhanoshin	5df6669d45	[AMDGPU] Enforce alignment of image vaddr on gfx90a Even though single address image instructions only use a single VGPR HW accesses 4 or 5 which creates alignment requirement. Fixes: SWDEV-316648 Differential Revision: https://reviews.llvm.org/D126009	2022-05-24 10:05:39 -07:00
Jay Foad	78ec59e6ae	[AMDGPU] Handle mandatory literals in isOperandLegal Extend SIInstrInfo::isOperandLegal to enforce a limit on the number of literal operands for all VALU instructions, not just VOP3. In particular it now handles VOP2 instructions with a mandatory literal operand like V_FMAAK_F32. Differential Revision: https://reviews.llvm.org/D126064	2022-05-20 16:14:00 +01:00
Jay Foad	5b18ef7256	[AMDGPU] Add verification for mandatory literals Extend the literal operand checking in SIInstrInfo::verifyInstruction to check VOP2 instructions like V_FMAAK_F32 which have a mandatory literal operand. The rule is that src0 can also be a literal, but only if it is the same literal value. AMDGPUAsmParser::validateConstantBusLimitations already handles this correctly. Differential Revision: https://reviews.llvm.org/D126063	2022-05-20 16:14:00 +01:00
Jay Foad	d14f2a6359	[AMDGPU] Allow multiple uses of the same literal in SOP2/SOPC AMDGPUAsmParser::validateSOPLiteral already knew about this but SIInstrInfo::verifyInstruction did not. Differential Revision: https://reviews.llvm.org/D125976	2022-05-19 16:42:20 +01:00
Stanislav Mekhanoshin	dee3190293	[AMDGPU] Add llvm.amdgcn.global.load.lds intrinsic Differential Revision: https://reviews.llvm.org/D125279	2022-05-17 12:35:27 -07:00

1 2 3 4 5 ...

704 Commits