llvm-project

Author	SHA1	Message	Date
Kazu Hirata	972983539b	[llvm] Apply fixes from readability-redundant-control-flow (NFC)	2023-04-16 00:13:46 -07:00
Diana Picus	b9ba05360e	[AMDGPU] Don't S_MOV_B32 into $scc The peephole optimizer tries to replace ``` %n:sgpr_32 = S_MOV_B32 x $scc = COPY %n ``` with a `S_MOV_B32` directly into `$scc`. This crashes because `S_MOV_B32` cannot take `$scc` as input. We currently generate code like this from GlobalISel when lowering a G_BRCOND with a constant condition. We should probably look into removing this kind of branch altogether, but until then we should at least not crash. This patch fixes the issue by making sure we don't apply the peephole optimization when trying to move into a physical register that doesn't belong to the correct register class. Differential Revision: https://reviews.llvm.org/D148117	2023-04-14 10:24:43 +02:00
skc7	b434051dc8	[AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147168	2023-04-10 11:34:14 +05:30
Mateja Marjanovic	48f6964bcb	[AMDGPU][GlobalISel] Add support for S_INDIRECT_REG_WRITE_MOVREL_B32_V[9\|10\|11\|12]	2023-03-30 18:27:49 +02:00
Jay Foad	3e3594c771	[AMDGPU] Do not fix implicit vcc operand on INLINEASM An INLINEASM can have an implicit def of vcc. It is not appropriate for fixImplicitOperands to change this to vcc_lo on wave32. Differential Revision: https://reviews.llvm.org/D147157	2023-03-29 20:23:36 +01:00
Mirko Brkusanin	2eada459c7	[AMDGPU][MachineVerifier] Fix vdata reg count for MIMG d16 Differential Revision: https://reviews.llvm.org/D145785	2023-03-10 14:47:49 +01:00
Jay Foad	08bdff862c	[AMDGPU] Fix error message for illegal copy	2023-03-03 11:46:01 +00:00
ZHU Zijia	8fccdfa436	[AMDGPU] Remove outdated FIXME in comments [NFC] This case has already been handled by D106449.	2023-03-03 01:34:19 +08:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Piotr Sobczak	a3d7b3121c	[AMDGPU][NFC] Add getMaxMUBUFImmOffset Replace magic constant 4095 with the function getMaxMUBUFImmOffset(). Differential Revision: https://reviews.llvm.org/D144623	2023-02-23 11:29:59 +01:00
Jay Foad	c9f4df57ca	[AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfo Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should not be calling into GCNSubtarget. Differential Revision: https://reviews.llvm.org/D144564	2023-02-22 16:19:05 +00:00
Yashwant Singh	cde2f330b3	[AMDGPU] Introduce never uniform bit field in tablegen IsNeverUniform can be set to 1 to mark instructions which are inherently never-uniform/divergent. Enabling this bit in Writelane instruction for now. To be extended to all required instructions. Reviewed By: arsenm, sameerds, #amdgpu Differential Revision: https://reviews.llvm.org/D143154	2023-02-08 11:45:48 +05:30
Yashwant Singh	422d379de2	[AMDGPU] Use tablegen to list uniform intrinsics Right now we do opcode wise matching to identify uniform/non-divergent AMDGPU intrinsics. It is duplicated at 2 places once at IR level uniformity analysis and at MIR level. Moving them to single tablegen table for consistency and adding and API rapper to access them. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D142961	2023-01-31 17:44:40 +05:30
Matt Arsenault	4002576156	AMDGPU/GlobalISel: Partially fix getGenericInstructionUniformity This was broken for the common case of instructions which are uniform if their inputs are uniform. This is broken for control flow intrinsics since the API currently does not express which result operand is in question. This generates failures in just about every intrinsic test when uniformity analysis is performed without this.	2023-01-30 15:47:18 -04:00
Matt Arsenault	490e348e67	AMDGPU: Partially fix machine uniformity for inline asm This was assuming virtual registers only, and asserting on physical. This was also ignoring AGPRs, and only considering VGPRs. Reporting the instruction as uniform or not is conceptually wrong, this should be reported per-operand. An inline asm statement could include uniform and non-uniform components. This should report purely for the register defs and ignore the uses. Fixes asserting on most of the inline asm tests when uniformity analysis is used.	2023-01-30 15:47:18 -04:00
Matt Arsenault	17ce615c78	AMDGPU: Fix null dereference in getInstructionUniformity This was failing when it couldn't find an allocatable class for special physical register inputs (like $mode), which are all scalars. This avoids numerous test failures when regbankselect is updated to use uniformity analysis.	2023-01-30 15:47:17 -04:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Jay Foad	073401e59c	[MC] Define and use MCInstrDesc implicit_uses and implicit_defs. NFC. The new methods return a range for easier iteration. Use them everywhere instead of getImplicitUses, getNumImplicitUses, getImplicitDefs and getNumImplicitDefs. A future patch will remove the old methods. In some use cases the new methods are less efficient because they always have to scan the whole uses/defs array to count its length, but that will be fixed in a future patch by storing the number of implicit uses/defs explicitly in MCInstrDesc. At that point there will be no need to 0-terminate the arrays. Differential Revision: https://reviews.llvm.org/D142215	2023-01-23 14:44:58 +00:00
Jay Foad	245e3dd948	[MC] Do not copy MCInstrDescs. NFC. Avoid copying MCInstrDesc instances because a future patch will change them to find their implicit operands and operand info array based on their own "this" pointer, so it will only work for MCInstrDescs in the TargetInsts table, not for a copy of an MCInstrDesc at a different address. Differential Revision: https://reviews.llvm.org/D142214	2023-01-23 11:55:49 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Kazu Hirata	caa99a01f5	Use llvm::popcount instead of llvm::countPopulation(NFC)	2023-01-22 12:48:51 -08:00
Jeffrey Byrnes	1f08d3bc3a	[AMDGPU] Further reduce attaching of implicit operands to spills Extension of https://reviews.llvm.org/D141101 to even further reduce the amount of implicit operands we attach. The main benefit is to improve cability of post-ra scheduler, and reduce unneeded dependency resolution (e.g. inserting snops). Unfortunately, we run into regressions if we completely minimize the amount implicit operands (naively), we run into some regressions (e.g. dual_movs are replaced with multiple calls to v_mov). This is even more reason to switch to LiveRegUnits. Nonetheless, this patch removes the operands which we can for free (more or less). Change-Id: Ib4f409202b36bdbc59eed615bc2d19fa8bd8c057 Differential Revision: https://reviews.llvm.org/D141557 Change-Id: I8b039e3c0d39436b384083f8beb947ee1b1730b2	2023-01-19 14:31:07 -08:00
Jay Foad	f460c66581	[AMDGPU] Simplify getNumFlatOffsetBits. NFC. Previously we considered this field to be either N-bit unsigned or N+1-bit signed, depending on the instruction. I think it's conceptually simpler to say that the field is always N+1-bit signed, but some instructions do not allow negative values. Differential Revision: https://reviews.llvm.org/D140883	2023-01-12 10:40:36 +00:00
Ruiling Song	cce24b6af0	AMDGPU: Remove IsSourceOfDivergence check This bit is not set/reserved in td file. Let's remove it for now, we can always add it back if we need it. Reviewed by: foad Differential Revision: https://reviews.llvm.org/D141223	2023-01-11 09:59:35 +08:00
Jeffrey Byrnes	596c558155	[AMDGPU] More selectively attach implicit operands to agpr spills Implicit def operands are needed when we spill partially undef super registers by each individual subregister. The implicit-def operands will allow us to lower spills without the verifier complaining. Currently, we are overzeously attaching implicit operands, when we really only need them on the first sub reg spill op. By more selectively attached the implicit ops, we will free up some unneeded dependencies for the post-ra scheduler. Moreover, this enables a previously incorrect optimization / resolves a correctness issue in indirectCopyToAGPR. When lowering AGPR copies on GFX908, we can improve CodeGen by reusing accvgpr_writes. However, we could not reliably determine which agprs accvgpr_writes actually define due to implicit-defs. Differential Revision: https://reviews.llvm.org/D141101	2023-01-09 15:10:06 -08:00
Ivan Kosarev	2d945ef864	[AMDGPU][NFC] Rename GFX10A16 operands. They do not seem to be GFX10-specific anymore. Also renames the corresponding feature. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D141069	2023-01-09 17:18:46 +00:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Christudasan Devadasan	a3028239a7	Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs" This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.	2022-12-21 16:17:42 +05:30
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Sameer Sahasrabuddhe	475ce4c200	RFC: Uniformity Analysis for Irreducible Control Flow Uniformity analysis is a generalization of divergence analysis to include irreducible control flow: 1. The proposed spec presents a notion of "maximal convergence" that captures the existing convention of converging threads at the headers of natual loops. 2. Maximal convergence is then extended to irreducible cycles. The identity of irreducible cycles is determined by the choices made in a depth-first traversal of the control flow graph. Uniformity analysis uses criteria that depend only on closed paths and not cycles, to determine maximal convergence. This makes it a conservative analysis that is independent of the effect of DFS on CycleInfo. 3. The analysis is implemented as a template that can be instantiated for both LLVM IR and Machine IR. Validation: - passes existing tests for divergence analysis - passes new tests with irreducible control flow - passes equivalent tests in MIR and GMIR Based on concepts originally outlined by Nicolai Haehnle <nicolai.haehnle@amd.com> With contributions from Ruiling Song <ruiling.song@amd.com> and Jay Foad <jay.foad@amd.com>. Support for GMIR and lit tests for GMIR/MIR added by Yashwant Singh <yashwant.singh@amd.com>. Differential Revision: https://reviews.llvm.org/D130746	2022-12-20 07:22:24 +05:30
Christudasan Devadasan	40ba0942e2	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case. This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124196	2022-12-17 11:56:32 +05:30
Christudasan Devadasan	b5efec4b27	[CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot With D134950, targets get notified when a virtual register is created and/or cloned. Targets can do the needful with the delegate callback. AMDGPU propagates the virtual register flags maintained in the target file itself. They are useful to identify a certain type of machine operands while inserting spill stores and reloads. Since RegAllocFast spills the physical register itself, there is no way its virtual register can be mapped back to retrieve the flags. It can be solved by passing the virtual register as an additional argument. This argument has no use when the spill interfaces are called during the greedy allocator or even the PrologEpilogInserter and can pass a null register in such cases. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138656	2022-12-17 11:55:34 +05:30
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Joe Nash	bbfbec94b1	[AMDGPU] Enable OMod on more VOP3 instructions OMod was disabled if OpSel was enabled, but that restriction is more specific than necessary. Any VOP3 with float operands can use OMod. On GFX11, FMAC_F16_e64 can use op_sel. Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139469	2022-12-07 13:30:33 -05:00
Piotr Sobczak	1e3abd82b9	[AMDGPU] Fix wide spills Update spill code to account for new vector types with bit widths: 288, 320, 352, 384. Related to D138205. Differential Revision: https://reviews.llvm.org/D139203	2022-12-07 10:23:52 +01:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Jay Foad	49762162ea	[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands. To avoid a profusion of similar functions with undocumented differences, this patch removes all the isLiteralConstant* variants. Callers are responsible for handling the isReg case. Differential Revision: https://reviews.llvm.org/D125759	2022-11-17 16:45:48 +00:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Jay Foad	ea60545b0e	[AMDGPU] Create new instructions in SIInstrInfo::moveToVALU Create new VALU instructions in moveToVALU instead of mutating the existing SALU instruction. This makes it easier to add extra operands so we can convert to the VOP3 form of VALU instructions. NFCI but it does have the minor side effect of removing duplicate implicit operands that were present on the original SALU if they are default implicit operands for the VALU. Differential Revision: https://reviews.llvm.org/D137324	2022-11-04 07:21:11 +00:00
Kazu Hirata	b0e0cdf1c7	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:7383: warning: enumerated mismatch in conditional expression: ‘llvm::AMDGPU::UfmtGFX11::UnifiedFormat’ vs ‘llvm::AMDGPU::UfmtGFX10::UnifiedFormat’	2022-10-30 13:09:59 -07:00
Jay Foad	9bb1e21f07	[AMDGPU] Clean up calls to MachineOperand::setIsDead and friends. NFC.	2022-10-28 10:44:08 +01:00
Carl Ritson	a3646ec1bc	[AMDGPU] Add pseudo wavemode to optimize strict_wqm Strict WQM does not require a WQM transistion if it occurs within an existing WQM section. This occurs heavily in GFX11 pixel shaders with LDS_PARAM_LOAD. Which leads to unnecessary EXEC mask manipulation. To avoid these transitions, detect WQM -> Strict WQM -> WQM and substitute new ENTER_PSEUDO_WM/EXIT_PSEUDO_WM markers instead. These are treat similarly by WWM register pre-allocation pass, but do not manipulate EXEC or use registers to save EXEC state. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D136813	2022-10-28 09:45:17 +09:00
Jay Foad	191d70f2f5	[AMDGPU] Use Register in more places in SIInstrInfo. NFC. Also avoid using AMDGPU::NoRegister when it's not neeeded.	2022-10-25 15:04:58 +01:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Ruiling Song	0404aafbe3	AMDGPU: Factor out hasDivergentBranch(). NFC This is helpful for detecting whether a block ends with divergent branch in passes before lowering the pseudo control flow instructions. Differential Revision: https://reviews.llvm.org/D133184	2022-09-14 13:27:21 +08:00
Matt Arsenault	7834194837	TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator.	2022-09-12 09:03:37 -04:00
Jay Foad	afa0ed33df	[AMDGPU] Fix shrinking of F16 FMA on newer subtargets D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of the opcode which is used on GFX9+. Differential Revision: https://reviews.llvm.org/D133489	2022-09-08 16:41:04 +01:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00

1 2 3 4 5 ...

725 Commits