llvm-project

Author	SHA1	Message	Date
Nicolai Hähnle	2eb767c9e1	AMDGPU: Scratch instructions are trivially disjoint from SMEM and buffer instructions (#65287 ) Scratch instructions are always in addrspace(5), which can only alias with flat (and itself). SMEM and buffer instructions can never reference those address spaces, so they are trivially disjoint.	2023-09-08 07:43:36 +02:00
Stanislav Mekhanoshin	294f632859	[AMDGPU] Move add64/sub64 to VALU This is NFCI as far as I can tell, but I see no reason not to do it. Differential Revision: https://reviews.llvm.org/D159077	2023-08-29 09:31:04 -07:00
Mateusz Hurnik	232f0c9a9a	[NFC][AMDGPU] Remove redundant code As the result of this constant function is unused it is redundant. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D158747	2023-08-25 11:18:48 +01:00
Stanislav Mekhanoshin	cfe9a134bb	[AMDGPU] Rename 64BitDPP feature and fix the checks Names '64BitDPP' and especially 'DPP64' were found misleading, and DPP64 can easily be mixed with DPP16 and DPP8 while these are different concepts. DPP16 and DPP8 refers to lanes where DPP64 refers to the operand size. In fact the essential part here is that these instructions are executed on the DP ALU, so rename the feature accordingly. I have also found a bug in a check for these instructions, which is fixed here and a common utility function is now used. Differential Revision: https://reviews.llvm.org/D158465	2023-08-22 11:00:10 -07:00
Mirko Brkusanin	de82fde22d	AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D157091	2023-08-18 18:23:40 +02:00
Christudasan Devadasan	81827f8cfb	[AMDGPU] Support wwm-reg AV spill pseudos The wwm register spill pseudos are currently defined for VGPR_32 regclass. It causes a verifier error for gfx908 or above as the regalloc sometimes restores the values to the vector superclass AV_32. Fixing it by supporting AV wwm-spill pseudos as well. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155646	2023-08-17 20:04:18 +05:30
Stanislav Mekhanoshin	f7480bc5c1	[AMDGPU] Decouple V_PK_MOV_B32 from FeaturePackedFP32Ops This is not an FP32 operation. Differential Revision: https://reviews.llvm.org/D157909	2023-08-14 14:22:52 -07:00
Mirko Brkusanin	1e5359c6ba	[AMDGPU] Treat KIMM32 and KIMM16 operand types as noninlinable While they are represent 32/16 bit immediate values they are already included in encoding of the instructions that use them and are not true literals. FMAMK and FMAAK instructions that use them are marked with fixed size so getInstSizeInBytes will not increase the size for these operands. We also add tests whose logic relies on KIMM16 and KIMM32 being considered not inlinable. Differential Revision: https://reviews.llvm.org/D157624	2023-08-11 18:46:39 +02:00
Matt Arsenault	4b1702e87a	AMDGPU: Fix counting source modifiers as literal constants This fixes over estimating code size. This was broken by 79f52af4cd9a76485dd50bcdbb5d393eb7a70103. https://reviews.llvm.org/D157103	2023-08-07 18:40:16 -04:00
Sander de Smalen	bbb95893de	[TII] NFCI: Simplify the interface for isTriviallyReMaterializable Currently `isTriviallyReMaterializable` calls `isReallyTriviallyReMaterializable` and `isReallyTriviallyReMaterializableGeneric`. The two interfaces are confusing, but there are also some real issues with this. The documentation of this function (see below) suggests that `isReallyTriviallyRematerializable` allows the target to override the default behaviour. /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. It however implements something different. The default behaviour is the analysis done in `isReallyTriviallyReMaterializableGeneric`, which is testing if it is safe to rematerialize the MachineInstr. The result of `isReallyTriviallyReMaterializable` is only considered if `isReallyTriviallyReMaterializableGeneric` returns `false`. That means there is no way to override the default behaviour if `isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to rematerialize, but we'd rather not). By making this a single interface, we can override the interface to do either. Reviewed By: craig.topper, nemanjai Differential Revision: https://reviews.llvm.org/D156520	2023-08-07 13:01:06 +00:00
Jay Foad	e61ca23289	[AMDGPU] Add and use SIInstrFlags::GWS. NFC. This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099	2023-08-07 12:05:14 +01:00
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Sameer Sahasrabuddhe	7c760b224b	Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556 This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.	2023-07-27 14:49:17 +05:30
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Sameer Sahasrabuddhe	d0f7850b01	Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. The changes did not cover all occurrences of the deteleted function MachineInstr::getIntrinsicID().	2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe	baa3386edb	[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556	2023-07-27 10:00:45 +05:30
Stanislav Mekhanoshin	7972b9c829	[AMDGPU] Move SIEncodingFamily into SIDefines.h. NFC. I need this for future patch in the MC, while TII is not available in the llvm-mc. Besides this is not a first time I want it there. Differential Revision: https://reviews.llvm.org/D155228	2023-07-13 12:42:28 -07:00
Christudasan Devadasan	b4a62b1fa5	[AMDGPU] Enable whole wave register copy So far, we haven't exposed the allocation of whole-wave registers to regalloc. We hand-picked them for various whole wave mode operations. With a future patch, we want the allocator to efficiently allocate them rather than using the custom pre-allocation pass. Any liverange split of virtual registers involved in whole-wave operations require the resulting COPY introduced with the split to be performed for all lanes. It isn't implemented in the compiler yet. This patch would identify all such copies and manipulate the exec mask around them to enable all lanes without affecting the value of exec mask elsewhere. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D143762	2023-07-07 22:58:55 +05:30
Christudasan Devadasan	b78b36e1a2	[AMDGPU] Implement whole wave register spill To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spill loads and restores should be activated for all lanes by temporarily flipping all bits in exec register to one just before the spills. It is not implemented in the compiler as of today and this patch enables the necessary support. This is a pre-patch before the SGPR spill to virtual VGPR lanes that would eventually causes the whole wave register spills during allocation. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D143759	2023-07-07 22:51:45 +05:30
Ivan Kosarev	9d8171f8c4	[AMDGPU][Codegen] Clean up legalizeOpWithMove(). The removed logic was added in <https://reviews.llvm.org/rG0c93c9ecee0624f8469f5a971a09fbc9e9cc1061>, but now doesn't seem to be needed. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154220	2023-06-30 16:48:16 +01:00
Brendon Cahoon	853b2a84cb	[AMDGPU] Reserve SGPR pair when long branches are present Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the case when an indirect branch target is too far away. The register scavanger may not find available registers, which causes a “did not find scavenging index” assert to occur in assignRegToScavengingIndex. In this patch, we estimate before register allocation whether an indirect branch is likely to be needed, and reserve 2 SGPRs if the branch distance is found to be above a threshold. The distance threshold is an approximation as the exact code size and branch distance are unknown prior to register allocation. Patch by Corbin Robeck. Thanks! Differential Review: https://reviews.llvm.org/D149775	2023-06-29 16:50:46 -05:00
Ivan Kosarev	4e312abdfd	[AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152257	2023-06-07 11:41:11 +01:00
Jay Foad	3030c03988	[AMDGPU] Make use of MachineInstr::all_defs and all_uses. NFCI.	2023-06-05 10:32:33 +01:00
Carl Ritson	2e87ed80b2	[AMDGPU] WQM: Allow insertion of exact mode transition as terminator Allow WQM pass to insert transitions to exact mode among block terminators, instead of forcing them to occur before terminators. This should not yield any functional change, but allows block splitting of control flow, such as that in D145329. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D151797	2023-06-02 14:00:54 +09:00
Jay Foad	94e48d433d	[AMDGPU] Switch to backwards scavenging in non-spill cases When the scavenger is not allowed to spill, the only difference between forward and backward should be the heuristics used to pick an available register. Forwards scavenging tries to pick a register that can be used again later in the BB; backwards scavenging tries to pick one that can be used earlier. Backwards scavenging is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D151323	2023-05-24 15:03:33 +01:00
Jay Foad	3c0d81d43f	[AMDGPU] Simplify scavenging in indirectCopyToAGPR This just makes it clearer that we do not want the scavenger to spill here. NFCI. Differential Revision: https://reviews.llvm.org/D150774	2023-05-22 11:55:26 +01:00
Jay Foad	64c938e8e3	[AMDGPU] Avoid RegScavenger::forward in copyPhysReg/indirectCopyToAGPR RegScavenger::backward is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D150571	2023-05-16 15:51:31 +01:00
Jeffrey Byrnes	88149fb3f4	[AMDGPU][GFX908] IndirectCopyToAGPR: Confirm modified register is dst reg of accvgpr_write IndirectCopyToAGPR should be reworked as to avoid optimizing during copy lowering. However, as it stands, the code is buggy. This patch replaces the call to definesRegister with modifiesRegister, and confirms that the dest reg of the found accvgpr_write is in fact the src reg of our copy. Differential Revision: https://reviews.llvm.org/D149873 Change-Id: Id8a61659ac15565dcb970069d0624f0925a46e6d	2023-05-12 12:38:29 -07:00
skc7	e016fb57b3	[AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic. Legalize soffset of buffer instructions using waterfall loop. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141030	2023-04-27 19:36:50 +05:30
Janek van Oirschot	124acb7ca3	[AMDGPU] Fix negative offset values interpretation in getMemOperandsWithOffset for DS The offset values may result in an erroneous scheduling of a load before write for a memory location if the offset values are represented as negative values in MIR, despite actually being unsigned values. This representation in MIR happens as SelectionDAG::getConstant could go through APInt to represent the encoding which assumes the MSB of the encoding as a sign-bit, regardless of whether it is supposed to be a signed value. The 8-bit negative (interpreted) value gets cast to an unsigned 32 bit value in getMemOperandsWithOffset used for comparisons in areMemAccessesTriviallyDisjoint eventually leading to an erroneous schedule in the machine scheduler. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D149080	2023-04-26 14:10:25 +01:00
Kazu Hirata	972983539b	[llvm] Apply fixes from readability-redundant-control-flow (NFC)	2023-04-16 00:13:46 -07:00
Diana Picus	b9ba05360e	[AMDGPU] Don't S_MOV_B32 into $scc The peephole optimizer tries to replace ``` %n:sgpr_32 = S_MOV_B32 x $scc = COPY %n ``` with a `S_MOV_B32` directly into `$scc`. This crashes because `S_MOV_B32` cannot take `$scc` as input. We currently generate code like this from GlobalISel when lowering a G_BRCOND with a constant condition. We should probably look into removing this kind of branch altogether, but until then we should at least not crash. This patch fixes the issue by making sure we don't apply the peephole optimization when trying to move into a physical register that doesn't belong to the correct register class. Differential Revision: https://reviews.llvm.org/D148117	2023-04-14 10:24:43 +02:00
skc7	b434051dc8	[AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147168	2023-04-10 11:34:14 +05:30
Mateja Marjanovic	48f6964bcb	[AMDGPU][GlobalISel] Add support for S_INDIRECT_REG_WRITE_MOVREL_B32_V[9\|10\|11\|12]	2023-03-30 18:27:49 +02:00
Jay Foad	3e3594c771	[AMDGPU] Do not fix implicit vcc operand on INLINEASM An INLINEASM can have an implicit def of vcc. It is not appropriate for fixImplicitOperands to change this to vcc_lo on wave32. Differential Revision: https://reviews.llvm.org/D147157	2023-03-29 20:23:36 +01:00
Mirko Brkusanin	2eada459c7	[AMDGPU][MachineVerifier] Fix vdata reg count for MIMG d16 Differential Revision: https://reviews.llvm.org/D145785	2023-03-10 14:47:49 +01:00
Jay Foad	08bdff862c	[AMDGPU] Fix error message for illegal copy	2023-03-03 11:46:01 +00:00
ZHU Zijia	8fccdfa436	[AMDGPU] Remove outdated FIXME in comments [NFC] This case has already been handled by D106449.	2023-03-03 01:34:19 +08:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Piotr Sobczak	a3d7b3121c	[AMDGPU][NFC] Add getMaxMUBUFImmOffset Replace magic constant 4095 with the function getMaxMUBUFImmOffset(). Differential Revision: https://reviews.llvm.org/D144623	2023-02-23 11:29:59 +01:00
Jay Foad	c9f4df57ca	[AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfo Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should not be calling into GCNSubtarget. Differential Revision: https://reviews.llvm.org/D144564	2023-02-22 16:19:05 +00:00
Yashwant Singh	cde2f330b3	[AMDGPU] Introduce never uniform bit field in tablegen IsNeverUniform can be set to 1 to mark instructions which are inherently never-uniform/divergent. Enabling this bit in Writelane instruction for now. To be extended to all required instructions. Reviewed By: arsenm, sameerds, #amdgpu Differential Revision: https://reviews.llvm.org/D143154	2023-02-08 11:45:48 +05:30
Yashwant Singh	422d379de2	[AMDGPU] Use tablegen to list uniform intrinsics Right now we do opcode wise matching to identify uniform/non-divergent AMDGPU intrinsics. It is duplicated at 2 places once at IR level uniformity analysis and at MIR level. Moving them to single tablegen table for consistency and adding and API rapper to access them. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D142961	2023-01-31 17:44:40 +05:30
Matt Arsenault	4002576156	AMDGPU/GlobalISel: Partially fix getGenericInstructionUniformity This was broken for the common case of instructions which are uniform if their inputs are uniform. This is broken for control flow intrinsics since the API currently does not express which result operand is in question. This generates failures in just about every intrinsic test when uniformity analysis is performed without this.	2023-01-30 15:47:18 -04:00
Matt Arsenault	490e348e67	AMDGPU: Partially fix machine uniformity for inline asm This was assuming virtual registers only, and asserting on physical. This was also ignoring AGPRs, and only considering VGPRs. Reporting the instruction as uniform or not is conceptually wrong, this should be reported per-operand. An inline asm statement could include uniform and non-uniform components. This should report purely for the register defs and ignore the uses. Fixes asserting on most of the inline asm tests when uniformity analysis is used.	2023-01-30 15:47:18 -04:00
Matt Arsenault	17ce615c78	AMDGPU: Fix null dereference in getInstructionUniformity This was failing when it couldn't find an allocatable class for special physical register inputs (like $mode), which are all scalars. This avoids numerous test failures when regbankselect is updated to use uniformity analysis.	2023-01-30 15:47:17 -04:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Jay Foad	073401e59c	[MC] Define and use MCInstrDesc implicit_uses and implicit_defs. NFC. The new methods return a range for easier iteration. Use them everywhere instead of getImplicitUses, getNumImplicitUses, getImplicitDefs and getNumImplicitDefs. A future patch will remove the old methods. In some use cases the new methods are less efficient because they always have to scan the whole uses/defs array to count its length, but that will be fixed in a future patch by storing the number of implicit uses/defs explicitly in MCInstrDesc. At that point there will be no need to 0-terminate the arrays. Differential Revision: https://reviews.llvm.org/D142215	2023-01-23 14:44:58 +00:00
Jay Foad	245e3dd948	[MC] Do not copy MCInstrDescs. NFC. Avoid copying MCInstrDesc instances because a future patch will change them to find their implicit operands and operand info array based on their own "this" pointer, so it will only work for MCInstrDescs in the TargetInsts table, not for a copy of an MCInstrDesc at a different address. Differential Revision: https://reviews.llvm.org/D142214	2023-01-23 11:55:49 +00:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00

1 2 3 4 5 ...

755 Commits