llvm-project

Author	SHA1	Message	Date
Akash Dutta	c256966fe2	[AMDGPU]: Unpack packed instructions overlapped by MFMAs post-RA scheduling (#157968 ) This is a cleaned up version of PR #151704. These optimizations are now performed post-RA scheduling.	2025-09-19 09:41:02 -07:00
Matt Arsenault	daed12d00d	AMDGPU: Remove unnecessary AGPR legalize logic (#159491 ) The manual legalizeOperands code only need to consider cases that require full instruction context to know if the operand is legal. This does not need to handle basic operand register class constraints.	2025-09-19 09:51:46 +09:00
Matt Arsenault	aa8b624518	AMDGPU: Remove unnecessary operand legalization for WMMAs (#159370 ) The operand constraints already express this constraint, and InstrEmitter will respect them.	2025-09-18 09:20:18 +09:00
Matt Arsenault	d57aa484e1	AMDGPU: Constrain regclass when replacing SGPRs with VGPRs (#159369 )	2025-09-18 07:36:28 +09:00
Jay Foad	eeced0d073	[AMDGPU] Use larger immediate values in S_NOP (#158990 ) The S_NOP instruction has an immediate operand which is one less than the number of cycles to delay for. The maximum value that may be encoded in this field was increased in GFX8 and again in GFX12.	2025-09-16 15:51:06 +01:00
Stanislav Mekhanoshin	72aa946762	[AMDGPU] Drop high 32 bits of aperture registers (#158725 ) Fixes: SWDEV-551181	2025-09-16 02:11:39 -07:00
Carl Ritson	fdb06d9792	[AMDGPU] Refactor out common exec mask opcode patterns (NFCI) (#154718 ) Create utility mechanism for finding wave size dependent opcodes used to manipulate exec/lane masks.	2025-09-16 03:22:14 +00:00
Matt Arsenault	1dc4db8f1e	AMDGPU: Relax verifier for agpr/vgpr loads and stores (#158391 )	2025-09-13 16:34:02 +09:00
Matt Arsenault	7289f2cd0c	CodeGen: Remove MachineFunction argument from getRegClass (#158188 ) This is a low level utility to parse the MCInstrInfo and should not depend on the state of the function.	2025-09-12 19:22:02 +09:00
Matt Arsenault	3b48c64d08	AMDGPU: Move spill pseudo special case out of adjustAllocatableRegClass (#158246 ) This is special for the same reason av_mov_b64_imm_pseudo is special.	2025-09-12 18:35:57 +09:00
Matt Arsenault	9e1d656c68	AMDGPU: Remove MIMG special case in adjustAllocatableRegClass (#158184 ) I have no idea why this was here. MIMG atomics use tied operands for the input and output, so AV classes should have always worked. We have poor test coverage for AGPRs with atomics, so add a partial set. Everything seems to work OK, although it seems image cmpswap always uses VGPRs unnecessarily.	2025-09-12 09:02:24 +00:00
Matt Arsenault	5a21128f24	AMDGPU: Relax legal register operand constraint (#157989 ) Find a common subclass instead of directly checking for a subclass relationship. This fixes folding logic for unaligned register defs into aligned use contexts. e.g., a vreg_64 def into an av_64_align2 use should be able to find the common subclass vreg_align2. This avoids regressions in future patches. Checking the subclass was also redundant on the subregister path; getMatchingSuperRegClass is sufficient.	2025-09-12 08:57:47 +09:00
Matt Arsenault	1c325a07f8	AMDGPU: Stop checking allocatable in adjustAllocatableRegClass (#158105 ) This no longer does anything.	2025-09-12 08:56:34 +09:00
Petar Avramovic	41c685975e	AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (#157845 ) Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD. Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD should be always divergent.	2025-09-10 17:57:15 +02:00
Stanislav Mekhanoshin	b0ee92be94	[AMDGPU] Restrict scale operands of WMMA to low 256 VGPRs (#157526 ) These cannot accept high registers.	2025-09-08 15:44:51 -07:00
Matt Arsenault	727e9f5ea5	CodeGen: Pass SubtargetInfo to TargetGenInstrInfo constructors (#157337 ) This will make it possible for tablegen to make subtarget dependent decisions without adding new arguments to every target. --------- Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>	2025-09-08 12:12:19 +09:00
Matt Arsenault	884130bf93	AMDGPU: Allow folding multiple uses of some immediates into copies (#154757 ) In some cases this will require an avoidable re-defining of a register, but it works out better most of the time. Also allow folding 64-bit immediates into subregister extracts, unless it would break an inline constant. We could be more aggressive here, but this set of conditions seems to do a reasonable job without introducing too many regressions.	2025-09-06 08:22:09 +09:00
Matt Arsenault	d096b1d48e	AMDGPU: Remove flat special case in getRegClass (#156991 )	2025-09-06 07:42:16 +09:00
Stanislav Mekhanoshin	1f0f3473e6	[AMDGPU] High VGPR lowering on gfx1250 (#156965 )	2025-09-04 16:20:47 -07:00
Pierre van Houtryve	e2bd10cf16	[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418 ) - Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.	2025-09-04 09:19:25 +00:00
Diana Picus	018dc1b397	[AMDGPU] Tail call support for whole wave functions (#145860 ) Support tail calls to whole wave functions (trivial) and from whole wave functions (slightly more involved because we need a new pseudo for the tail call return, that patches up the EXEC mask). Move the expansion of whole wave function return pseudos (regular and tail call returns) to prolog epilog insertion, since that's where we patch up the EXEC mask.	2025-09-04 10:34:43 +02:00
Matt Arsenault	a23a5b0683	AMDGPU: Remove the DS special case in getRegClass (#156696 ) These instructions should now have proper representation with separate instructions for operands which must be paired.	2025-09-04 15:14:17 +09:00
Matt Arsenault	dc170c7e31	AMDGPU: Special case align requirement for AV_MOV_B64_IMM_PSEUDO This should not require aligned registers. Fixes expensive_checks test failure. I don't see a better way until the new system to specify the alignment per register is done.	2025-09-04 09:55:39 +09:00
Matt Arsenault	dd5eb46690	AMDGPU: Fold 64-bit immediate into copy to AV class (#155615 ) This is in preparation for patches which will intoduce more copies to av registers.	2025-09-03 09:29:59 +09:00
Matt Arsenault	d7484684e5	AMDGPU: Refactor isImmOperandLegal (#155607 ) The goal is to expose more variants that can operate without preconstructed MachineInstrs or MachineOperands.	2025-09-03 09:06:18 +09:00
Matt Arsenault	d6a72cb300	AMDGPU: Fix fixme for out of bounds indexing in usesConstantBus check (#155603 ) This loop over all the operands in the MachineInstr will eventually go past the end of the MCInstrDesc's explicit operands. We don't need the instr desc to compute the constant bus usage, just the register and whether it's implicit or not. The check here is slightly conservative. e.g. a random vcc implicit use appended to an instruction will falsely report a constant bus use.	2025-09-02 17:25:08 +00:00
Matt Arsenault	e3e1652d18	AMDGPU: Add version of isImmOperandLegal for MCInstrDesc (#155560 ) This avoids the need for a pre-constructed instruction, at least for the first argument.	2025-09-03 01:18:41 +09:00
Chris Jackson	7d0203b39f	[AMDGPU] Prevent generation of unused SGPR IMPLICIT_DEF assignments (#155241 ) Dead VGPR->SGPR copies were converted to IMPLICIT_DEF assignments that were unused. Prevent these from being created and update the numerous affected tests.	2025-08-27 13:18:18 +01:00
Matt Arsenault	de99aabed6	AMDGPU: Remove unused argument from adjustAllocatableRegClass (#155554 )	2025-08-27 06:00:34 +00:00
Matt Arsenault	05f208ac0b	AMDGPU: Stop checking if registers are reserved in adjustAllocatableRegClass (#155125 ) This function is used to implement TargetInstrInfo::getRegClass and conceptually should not depend on the dynamic state of the function.	2025-08-26 20:09:32 +09:00
Matt Arsenault	db024764c1	AMDGPU: Fix not diagnosing unaligned VGPRs for vsrc operands (#155104 ) This was not checking the alignment requirement for 64-bit operands which accept inline immediates. Not all custom operand types were handled in the switch, so round out with explicit handling of all enum values, and change the default to use the default checks for unhandled cases. Fixes #155095	2025-08-25 17:42:58 +09:00
Matt Arsenault	52ed03db59	AMDGPU: Simplify foldImmediate with register class based checks (#154682 ) Generalize the code over the properties of the mov instruction, rather than maintaining parallel logic to figure out the type of mov to use. I've maintained the behavior with 16-bit physical SGPRs, though I think the behavior here is broken and corrupting any value that happens to be live in the high bits. It just happens there's no way to separately write to those with a real instruction but I don't think we should be trying to make assumptions around that property. This is NFC-ish. It now does a better job with imm pseudos which practically won't reach here. This also will make it easier to support more folds in a future patch. I added a couple of new tests with 16-bit extract of 64-bit sources.	2025-08-23 02:13:50 +00:00
Matt Arsenault	2b46f31ee3	AMDGPU: Sign extend immediates for 32-bit subregister extracts (#154870 ) extractSubregFromImm previously would sign extend the 16-bit subregister extracts, but not the 32-bit. We try to consistently store immediates as sign extended, since not doing it can result in misreported isInlineImmediate checks.	2025-08-22 16:50:36 +09:00
Matt Arsenault	fc5fcc0c95	AMDGPU: Start using AV_MOV_B64_IMM_PSEUDO (#154500 )	2025-08-22 13:59:36 +09:00
Matt Arsenault	694a488708	AMDGPU: Add pseudoinstruction for 64-bit agpr or vgpr constants (#154499 ) 64-bit version of 7425af4b7aaa31da10bd1bc7996d3bb212c79d88. We still need to lower to 32-bit v_accagpr_write_b32s, so this has a unique value restriction that requires both halves of the constant to be 32-bit inline immediates. This only introduces the new pseudo definitions, but doesn't try to use them yet.	2025-08-20 22:54:37 +09:00
Matt Arsenault	ed0e531044	AMDGPU: Use Register type for isStackAccess (#154320 )	2025-08-19 23:00:45 +09:00
Pierre van Houtryve	6f7c77fe90	[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319 ) PR #149247 made the MD accessible by the backend so we can now leverage it in the memory model. The first use case here is detecting if a flat op can access scratch memory. Benefits both the MemoryLegalizer and InsertWaitCnt.	2025-08-19 07:42:59 +02:00
Stanislav Mekhanoshin	906c9e9542	[AMDGPU] Remove misplaced assert. (#154187 ) The assert that RegScavenger required for long branching is now placed below the code to use s_add_pc64, where it is actually used.	2025-08-18 13:58:54 -07:00
Stanislav Mekhanoshin	13716843eb	[AMDGPU] Make s_setprio_inc_wg a scheduling boundary (#154188 )	2025-08-18 13:20:38 -07:00
Stanislav Mekhanoshin	ea14834966	[AMDGPU] Per-subtarget DPP instruction classification (#153096 ) This is NFCI at this point.	2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin	dddeb07c2e	[AMDGPU] Restrict packed math FP32 instructions to read only one SGPR per operand on gfx12+ (#152465 ) Sec. 4.6.7.1 of the gfx1250 SPG states that if an SGPR is used as an operand, only one SGPR will be read for both the low and high operations. As a result, the corresponding bits in `op_sel` and `op_sel_hi` must be the same when the operand is an SGPR. Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com> Co-authored-by: Tian, Shilei <Shilei.Tian@amd.com>	2025-08-07 16:13:34 -07:00
Shilei Tian	351b38f266	[AMDGPU] Mark address space cast from private to flat as divergent if target supports globally addressable scratch (#152376 ) Globally addressable scratch is a new feature introduced in gfx1250. However, this feature changes how scratch space is mapped into the flat aperture, making address space casts from private to flat no longer uniform.	2025-08-06 17:08:56 -04:00
Changpeng Fang	32161e9de3	[AMDGPU] Do not fold an immediate into instructions with frame indexes (#151263 ) Do not fold an immediate into an instruction that already has a frame index operand. A frame index could possibly turn out to be another immediate. Fixes: SWDEV-536263 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-06 11:47:37 -07:00
Stanislav Mekhanoshin	33abf05af4	[AMDGPU] gfx1250 v_permlane_* instructions (#151749 )	2025-08-01 16:14:19 -07:00
Stanislav Mekhanoshin	ce40863209	[AMDGPU] Add v_cvt_sr\|pk_bf8\|fp8_f16 gfx1250 instructions (#151415 )	2025-07-30 17:24:45 -07:00
Brox Chen	2a3f72ee6e	[AMDGPU][CodeGen][True16] Correct size calculation for d16 insts (#151042 ) D16 pesudo instructions are introduced in true16 mode to represet a D16 load/store. In MC lowering, the pesudo instructions are lowered to the corresponding D16 Lo/Hi MC Inst respecting the register allocation. However, the pesudo instruction has size 0 and cause an issue in the Inst size estimation. Use D16 Lo when calculating inst size	2025-07-29 13:01:57 -04:00
Pierre van Houtryve	2ad4e93ded	[AMDGPU][gfx1250] Use SCOPE_SE for stores that may hit scratch (#150586 )	2025-07-28 11:40:56 +02:00
Changpeng Fang	400ce1a3d3	[AMDGPU] Support AMDGPUClamp for bf16 on gfx1250 (#150663 ) Scalar version uses V_MAX_BF16_PSEUDO which is expanded to V_PK_MAX_BF16 with unused high bits. If V_PK_MAX_BF16 is produced directly instead that creates problem with folding of the clamp into other scalar instructions due to incompatible clamp bits. FIXME-TRUE16: enable bf16 clamp with true16 --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-25 12:13:06 -07:00
Jay Foad	8005c6a108	[AMDGPU] Simplify SIInstrInfo::isLegalToSwap. NFC. (#149058 )	2025-07-25 13:02:34 +01:00
Stanislav Mekhanoshin	2346968807	[AMDGPU] Add V_ADD\|SUB\|MUL_U64 gfx1250 opcodes (#150291 )	2025-07-23 13:17:56 -07:00

1 2 3 4 5 ...

1005 Commits